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Title Of The Invention 

NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO BACTEROIDES 
FRAGILIS FOR DIAGNOSTICS AND THERAPEUTICS 

INVENTOR: Gary L.Breton 

Related Applications: 

This application claims the benefit of U.S. Provisional Application Serial Number 
60/128,705, filed April 9, 1999, the entire teachings of which are incorporated herein by- 
reference. 



Field Of The Invention 

The invention relates to isolated nucleic acids and polypeptides derived from 
Bacteroides fragilis that are useful as molecular targets for diagnostics, prophylaxis and 
treatment of pathological conditions, as well as materials and methods for the diagnosis, 
prevention, and amelioration of pathological conditions resulting from bacterial 
infection. 




BACKGROUND OF THE INVENTION 



The genus Bacteroides is a member of the family Bacteroidaceae. They are 
Gram-negative, obligately anaerobic, nonsporeforming rods. The genus contains at least 
39 species, and are often isolated from sewage as well as the digestive tract of man, 
5 animals, and insects, Bacteroides fragilis was first described in 1 898 by Veillon and 
Zuber, but was called Bacillus fragilis. In 1919, Castellani and Chalmers transferred it to 
the Bacteroides genus. The "B. fragilis group" refers to the saccharoclastic bacteroids 
that grow well in bile. Members of this group were previously subspecies of B.fragilis 
and include B. fragilis, B. distasonis, B, ovatus, B. thetaiotaomicron, and B. vulgatus 

10 (Castellani and Chalmers. 1984. Genus I. Bacteroides 1919, 959. Krieg and Holt (editors) 
In Bergey's Manual of Systematic Bacteriology, 1 :604-63 1), 

Bacteroides fragilis accounts for only 1% of the normal flora of the human colon, 
but is the most common anaerobe isolated from clinical specimens. It is associated with 
soft tissue infections, abscesses and bacteremia (Moncrief J,, et al t 1998. Infect. Immun. 

15 66:1735-1739). B.fragilis has also been associated with infection of the skeletal muscle 
(Katagiri, K., et al, 1996. J. Dermatology. 23:129-132), and meningitis (Aucher, P., et al, 
1996. Eur. J. Clin. Microbiol. Infect. Dis. 15:820-823). The B. fragilis group is 
responsible for 65% of all anaerobic bacteremia cases, with mortality rates in excess of 
19% (Redondo, M., et al, 1995. Clinical Infectious Disease. 20: 1492-1496). 

20 In 1984, strains of B.fragilis were found to cause diarrhea in newborn lambs 

(Myers, L. ? etal, 1984. Infect. Immun. 44:241-244). Subsequently, it has been shown that 
B.fragilis is associated with diarrhea in other livestock and young children. These strains 
are called enterotoxigenic strains, because they produced a 20KD metalloprotease 
enterotoxin with intestinal secretory activity (Moncrief J., et al, 1995. Infect. Immun. 

25 63:175-181). 

There has been an increase in antibiotic resistance within the Bacteroides fragilis 
group. While there is still excellent activity of many antibiotics, even some of the most 
potent agents, the carbapenems and the B-lactamase-inhibitor combinations, are losing 
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activity (Snydman 5 D., et al 9 1996. Clinical Infectious Diseases. 23:S54-65). The 
cefoxitin resistance rate has increased from 0% in 1987 to 22% in 1995 (Bianchini, EL, 
et al, 1997, Clinical Infectious Diseases. 25:S268-269). Resistance to metronidazole, co- 
amoxiclav, and imipenem is rare, but strains have been found that are resistant to one or 
5 all of these antibiotics. (Turner,P., etal, 1995.The Lancet. 345:1275-1277), Clindaycin 
resistance has been shown to be transferred between strains by either plasmid or 
transposon mechanisms. (Dalmau, D., et al 9 1997. Clinical Infectious Diseases. 24:874- 
877). The increasing resistance to antibiotics commonly used against Bacteroides 
species may eventually lead to failures of these treatments. 
10 Sequencing and analysis of this genome is crucial for the identification of 

essential genes for development of drug targets and to reduce the emerging health threat 
this organism poses. 

SUMMARY OF THE INVENTION 
15 The present invention fulfills the need for diagnostic tools and therapeutics by 

providing bacterial-specific compositions and methods for detecting Bacteroides species 
including B, fragilis , as well as compositions and methods useful for treating and 
preventing Bacteroides infection, in particular, B. fragilis infection, in vertebrates 
including mammals. 

20 The present invention encompasses isolated nucleic acids and polypeptides 

derived from B. fragilis that are useful as reagents for diagnosis of bacterial disease, 
components of effective antibacterial vaccines, and/or as targets for antibacterial drugs 
including anti-2?. fragilis drugs. They can also be used to detect the presence of B. 
fragilis and other Bacteroides species in a sample; and in screening compounds for the 

25 ability to interfere with the B. fragilis life cycle or to inhibit B. fragilis infection. They 

also have use as biocontrol agents for plants. 

In one aspect, the invention features compositions of nucleic acids corresponding 

to entire coding sequences of B, fragilis proteins, including surface or secreted proteins 

or parts thereof, nucleic acids capable of binding mRNA from B. fragilis proteins to 

-3- 



2709.1001-001 



block protein translation, and methods for producing B. fragilis proteins or parts thereof 
using peptide synthesis and recombinant DNA techniques. This invention also features 
antibodies and nucleic acids useful as probes to detect B. fragilis infection. In addition, 
vaccine compositions and methods for the protection or treatment of infection by B. 
5 fragilis are within the scope of this invention. 

The nucleotide sequences provided in SEQ ID NO: 1 - SEQ ID NO: 5222, a 
fragment thereof, or a nucleotide sequence at least about 99.5% identical to a sequence 
contained within SEQ ID NO: 1 - SEQ ID NO: 5222 may be ''provided" in a variety of 
medias to facilitate use thereof. As used herein, "provided" refers to a manufacture, 
10 other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the 
present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1 - SEQ ID NO: 



sequence contained within SEQ ID NO: 1 - SEQ ID NO: 5222. Uses for and methods for 
providing nucleotide sequences in a variety of media is well known in the art (see e.g., 

15 EPO Publication No. EP 0 756 006). 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
readable media" refers to any media which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, such as 

20 floppy discs, hard disc storage media, and magnetic tape; optical storage media such as 
CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories such as magnetic/optical storage media. A person skilled in the art can 
readily appreciate how any of the presently known computer readable media can be used 
to create a manufacture comprising computer readable media having recorded thereon a 

25 nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable media. A person skilled in the art can readily adopt any of the 
presently known methods for recording information on computer readable media to 



5222, a fragment thereof, or a nucleotide sequence at least about 99.5% identical to a 
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generate manufactures comprising the nucleotide sequence information of the present 
invention. 

A variety of data storage structures are available to a person skilled in the art for 
creating a computer readable media having recorded thereon a nucleotide sequence of the 
5 present invention. The choice of the data storage structure will generally be based on the 
means chosen to access the stored information. In addition, a variety of data processor 
programs and formats can be used to store the nucleotide sequence information of the 
present invention on computer readable media. The sequence information can be 
represented in a word processing text file, formatted in commercially-available software 

10 such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A person 
skilled in the art can readily adapt any number of data processor structuring formats (e.g. 
text file or database) in order to obtain computer readable media having recorded thereon 
the nucleotide sequence information of the present invention. 

15 By providing the nucleotide sequence of SEQ ID NO: 1 - SEQ ID NO: 5222, a 

fragment thereof, or a nucleotide sequence at least about 99.5% identical to SEQ ID NO: 
1 - SEQ ID NO: 5222 in computer readable form, a person skilled in the art can routinely 
access the coding sequence information for a variety of purposes. Computer software is 
publicly available which allows a person skilled in the art to access sequence information 

20 provided in a computer readable media. Examples of such computer software include 
programs of the "Staden Package", "DNA Star", "MacVector", GCG "Wisconsin 
Package" (Genetics Computer Group, Madison, WI) and "NCBI Toolbox" (National 
Center For Biotechnology Information). Suitable programs are described, for example, 
in Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, Academic 

25 Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne Harwood Peruski, 
The Internet and the New Biology: Tools for Genomic and Molecular Research, 
American Society for Microbiology, Washington, D.C (1997). 
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Computer algorithms enable the identification of B, fragilis open reading frames 
(ORFs) within SEQ ID NO: 1 - SEQ ID NO: 5222 which contain homology to ORFs or 
proteins from other organisms. Examples of such similarity-search algorithms include 
the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and Smith- Waterman 
[Smith and Waterman (1981) Advances in Applied Mathematics, 2:482-489] search 
algorithms. Suitable search algorithms are described, for example, in Martin J. Bishop, 
ed., Guide to Human Genome Computing, 2d Edition, Academic Press, San Diego, CA. 
(1998); and Leonard R Peruski, Jr., and Anne Harwood Peruski, The Internet and the 
New Biology: Tools for Genomic and Molecular Research, American Society for 
Microbiology, Washington, D.C. (1997), Such algorithms are utilized on computer 
systems as exemplified below. The ORFs so identified represent protein encoding 
fragments within the B, fragilis genome and are useful in producing commercially 
important proteins such as enzymes used in fermentation reactions and in the production 
of commercially useful metabolites. 

The present invention further provides systems, particularly computer-based 
systems, which contain the sequence information described herein. Such systems are 
designed to identify commercially important fragments of the B. fragilis genome. As 
used herein, "a computer-based system" refers to the hardware .means, software means, 
and data storage means used to analyze the nucleotide sequence information of the 
present invention. The minimum hardware means of the computer-based systems of the 
present invention comprises a central processing unit (CPU), input means, output means, 
and data storage means. A person skilled in the art can readily appreciate that any one of 
the currently available computer-based systems is suitable for use in the present 
invention. The computer-based systems of the present invention comprise a data storage 
means having stored therein a nucleotide sequence of the present invention and the 
necessary hardware means and software means for supporting and implementing a search 
means. As used herein, "data storage means" refers to memory which can store 
nucleotide sequence information of the present invention, or a memory access means 



which can access manufactures having recorded thereon the nucleotide sequence 
information of the present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 
structural motif with the sequence information stored within the data storage means. 
Search means are used to identify fragments or regions of the B. fragilis genome which 
are similar to, or "match", a particular target sequence or target motif. A variety of 
known algorithms are known in the art and have been disclosed publicly, and a variety of 
commercially available software for conducting homology-based similarity searches are 
available and can be used in the computer-based systems of the present invention. 
Examples of such software includes, but is not limited to, FASTA (GCG Wisconsin 
Package), Bic_SW (Compugen Bioccelerator), BLASTN2, BLASTP2, BLASTX2 
(NCBI) and Motifs (GCG). Suitable software programs are described, for example, in 
Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, Academic Press, 
San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne Harwood Peruski, The 
Internet and the New Biology: Tools for Genomic and Molecular Research, American 
Society for Microbiology, Washington, D.C. (1997). A person skilled in the art can 
readily recognize that any one of the available algorithms or implementing software 
packages for conducting homology searches can be adapted for use in the present 
computer-based systems. 

As used herein, a "target sequence" can be any DNA or amino acid sequence of 
six or more nucleotides or two or more amino acids. A person skilled in the art can 
readily recognize that the longer a target sequence is, the less likely a target sequence will 
be present as a random occurrence in the database. The most preferred sequence length of 
a target sequencers from about 10 to 100 amino acids or from about 30 to 300 nucleotide 
residues. However, it is well recognized that many genes are longer than 500 amino 
acids, or 1 .5 kb in length, and that commercially important fragments of the B, fragilis 
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genome, such as sequence fragments involved in gene expression and protein processing, 
will often be shorter than 30 nucleotides. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 
5 chosen based on a specific functional domain or three-dimensional configuration which 
is formed upon the folding of the target polypeptide. There are a variety of target motifs 
known in the art. Protein target motifs include, but are not limited to, enzymatic active 
sites, membrane-spanning regions, and signal sequences. Nucleic acid target motifs 
include, but are not limited to, promoter sequences, hairpin structures and inducible 

10 expression elements (protein binding sequences). 

A variety of structural formats for the input and output means can be used to 
input and output the information in the computer-based systems of the present invention. 
A preferred format for an output means ranks fragments of the B. fragilis genome 
possessing varying degrees of homology to the target sequence or target motif Such 

15 presentation provides a person skilled in the art with a ranking of sequences which 
contain various amounts of the target sequence or target motif and identifies the degree 
of homology contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or target 
motif with the data storage means to identify sequence fragments of the B. fragilis 

20 genome. In the present examples, implementing software which implement the 

BLASTP2 and bic__SW algorithms (Altschul et aL, J Mol. Biol, 215:403-410 (1990); 
Compugen Biocellerator) was used to identify open reading frames within the B. fragilis 
genome. A person skilled in the art can readily recognize that any one of the publicly 
available homology search programs can be used as the search means for the computer- 

25 based systems of the present invention. Suitable programs are described, for example, in 
Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, Academic Press, 
San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne Harwood Peruski, The 
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Internet and the New Biology: Tools for Genomic and Molecular Research, American 
Society for Microbiology, Washington, D.C. (1997). 

The invention features B. fragilis polypeptides, preferably a substantially pure 
preparation of an B. fragilis polypeptide, or a recombinant B. fragilis polypeptide. In 
5 preferred embodiments: the polypeptide has biological activity; the polypeptide has an 
amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% identical to 
an amino acid sequence of the invention contained in the Sequence Listing, preferably it 
has about 65% sequence identity with an amino acid sequence of the invention contained 
in the Sequence Listing, and most preferably it has about 92% to about 99% sequence 
10 identity with an amino acid sequence of the invention contained in the Sequence Listing; 
the polypeptide has an amino acid sequence essentially the same as an amino acid 
;i j sequence of the invention contained in the Sequence Listing; the polypeptide is at least 

:; about 5, 10, 20, 50, 100, or 150 amino acid residues in length; the polypeptide includes at 

: ■ least about 5, preferably at least about 10, more preferably at least about 20, still more 

» : 15 preferably at least about 50, 100, or 150 contiguous amino acid residues of the invention 

* contained in the Sequence Listing. In yet another preferred embodiment, the amino acid 

Z sequence which differs in sequence identity by about 7% to about 8% from the B. fragilis 

amino acid sequences of the invention contained in the Sequence Listing is also 
^ encompassed by the invention. 

20 In preferred embodiments; the B, fragilis polypeptide is encoded by a nucleic 

acid of the invention contained in the Sequence Listing, or by a nucleic acid having at 
least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleic acid of 
the invention contained in the Sequence Listing. 

In a preferred embodiment, the subject B. fragilis polypeptide differs in amino 
25 acid sequence at about 1, 2, 3, 5, 10 or more residues from a sequence of the invention 
contained in the Sequence Listing. The differences, however, are such that the B. fragilis 
polypeptide exhibits an B. fragilis biological activity, e.g., the B. fragilis polypeptide 
retains a biological activity of a naturally occurring B. fragilis enzyme. 
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In preferred embodiments, the polypeptide includes all or a fragment of an amino 
acid sequence of the invention contained in the Sequence Listing; fused, in reading 
frame, to additional amino acid residues, preferably to residues encoded by genomic 
DNA 5' or 3' to the genomic DNA which encodes a sequence of the invention contained 
5 in the Sequence Listing. 

In yet other preferred embodiments, the B. fragilis polypeptide is a recombinant 
fusion protein having a first B. fragilis polypeptide portion and a second polypeptide 
portion, e.g., a second polypeptide portion having an amino acid sequence unrelated to B. 
fragilis . The second polypeptide portion can be, e.g., any of glutathione-S-transferase, a 
10 DNA binding domain, or a polymerase activating domain. In preferred embodiment the 
fusion protein can be used in a two-hybrid assay. 



Polypeptides of the invention include those which arise as a result of alternative 
transcription events, alternative RNA splicing events, and alternative translational and 
postranslational events. 



amino acid substitution, addition or deletion of at least one amino acid residue) in amino 
acid sequence at about 1, 2, 3, 5, 10 or more residues, from a sequence of the invention 
contained in the Sequence Listing. The differences, however, are such that: the B. 
fragilis encoded polypeptide exhibits an B. fragilis biological activity, e.g., the encoded 



20 B. fragilis enzyme retains a biological activity of a naturally occurring B. fragilis . 

In preferred embodiments, the encoded polypeptide includes all or a fragment of 
an amino acid sequence of the invention contained in the Sequence Listing; fused, in 
reading frame, to additional amino acid residues, preferably to residues encoded by 
genomic DNA 5' or 3* to the genomic DNA which encodes a sequence of the invention 
25 contained in the Sequence Listing. 

The B. fragilis strain, 14062, from which genomic sequences have been 
sequenced, has been deposited on July 20, 1998, in the American Type Culture 
Collection and assigned the ATCC designation # 202 1 58. 



15 



In a preferred embodiment, the encoded B. fragilis polypeptide differs (e.g., by 
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Included in the invention are: allelic variations; natural mutants; induced 
mutants; proteins encoded by DNA that hybridize under high or low stringency 
conditions to a nucleic acid which encodes a polypeptide of the invention contained in 
the Sequence Listing (for definitions of high and low stringency see Current Protocols in 
5 Molecular Biology, John Wiley & Sons, New York, 1989, 6.3.1 - 6.3.6, hereby 

incorporated by reference); and, polypeptides specifically bound by antisera to B. fragilis 
polypeptides, especially by antisera to an active site or binding domain of B. fragilis 
polypeptide. The invention also includes fragments, preferably biologically active 
fragments. These and other polypeptides are also referred to herein as B. fragilis 
10 polypeptide analogs or variants. 

The invention further provides nucleic acids, e.g., RNA or DNA and their 
3 respective complements, encoding a polypeptide of the invention. This includes double 

% stranded nucleic acids as well as coding and antisense single strands. 

■i j 

y In preferred embodiments, the subject B. fragilis nucleic acid will include a 

135!, 

: :sl 

3 15 transcriptional regulatory sequence, e.g., at least one of a transcriptional promoter or 

^ transcriptional enhancer sequence, operably linked to the B. fragilis gene sequence, e.g., 

'<t to render the B. fragilis gene sequence suitable for expression in a recombinant host cell. 

In yet a further preferred embodiment, the nucleic acid which encodes an B. 

1 55? 

;ri fragilis polypeptide of the invention, hybridizes under stringent conditions to a nucleic 

20 acid probe corresponding to at least about 8 consecutive nucleotides of the invention 
contained in the Sequence Listing; more preferably to at least about 12 consecutive 
nucleotides of the invention contained in the Sequence Listing; still more preferably to at 
least about 20 consecutive nucleotides of the invention contained in the Sequence 
Listing; most preferably to at least about 40 consecutive nucleotides of the invention 
25 contained in the Sequence Listing. 

In another aspect, the invention provides a substantially pure nucleic acid having 
a nucleotide sequence which encodes an B. fragilis polypeptide. In preferred 
embodiments: the encoded polypeptide has biological activity; the encoded polypeptide 
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has an amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 98% or 99% 
homologous to an amino acid sequence of the invention contained in the Sequence 
Listing; the encoded polypeptide has an amino acid sequence essentially the same as an 
amino acid sequence of the invention contained in the Sequence Listing; the encoded 
polypeptide is at least about 5, 10, 20, 50, 100, or 150 amino acids in length; the encoded 
polypeptide comprises at least about 5, preferably at least about 10, more preferably at 
least about 20, still more preferably at least about 50, 100, or 150 contiguous amino acids 
of the invention contained in the Sequence Listing. 

In another aspect, the invention encompasses: a vector including a nucleic acid 
which encodes an B. fragilis polypeptide or an B. fragilis polypeptide variant as 
described herein; a host cell transfected with the vector; and a method of producing a 
recombinant B. fragilis polypeptide or B. fragilis polypeptide variant; including culturing 
the cell, e.g., in a cell culture medium, and isolating an B. fragilis or B. fragilis 
polypeptide variant, e.g., from the cell or from the cell culture medium. 

One embodiment of the invention is directed to substantially isolated nucleic 
acids. Nucleic acids of the invention include sequences comprising at least about 8 
nucleotides in length, more preferably at least about 12 nucleotides in length, even more 
preferably at least about 15-20 nucleotides in length, that correspond to a subsequence of 
any one of SEQ ID NO: 1 - SEQ ID NO: 5222 or complements thereof. Alternatively, 
the nucleic acids comprise sequences contained within any ORF (open reading frame), 
including a complete protein-coding sequence, of which any of SEQ ID NO: 1 - SEQ ID 
NO: 5222 forms a part. The invention encompasses sequence-conservative variants and 
function-conservative variants of these sequences. The nucleic acids may be DNA, 
RNA, DNA/RNA duplexes, protein-nucleic acid (PNA), or derivatives thereof. 

In another aspect, the invention features a purified recombinant nucleic acid 
having at least about 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% sequence identity 
or % homology with a sequence of the invention contained in the Sequence Listing 
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The invention also encompasses recombinant DNA (including DNA cloning and 
expression vectors) comprising these B. fragilis -derived sequences; host cells 
comprising such DNA, including fungal, bacterial, yeast, plant, insect, and mammalian 
host cells; and methods for producing expression products comprising RNA and 



5 polypeptides encoded by the B. fragilis sequences. These methods are carried out by 
incubating a host cell comprising an B. fragilis -derived nucleic acid sequence under 
conditions in which the sequence is expressed. The host cell may be native or 
recombinant. The polypeptides can be obtained by (a) harvesting the incubated cells to 
produce a cell fraction and a medium fraction; and (b) recovering the B. fragilis 
10 polypeptide from the cell fraction, the medium fraction, or both. The polypeptides can 
also be made by in vitro translation. 

In another aspect, the invention features nucleic acids capable of binding mRNA 
of B. fragilis . Such nucleic acid is capable of acting as antisense nucleic acid to control 
the translation of mRNA of B. fragilis . A further aspect features a nucleic acid which is 
15 capable of binding specifically to an B. fragilis nucleic acid. These nucleic acids are also 
referred to herein as complements and have utility as probes and as capture reagents. 

In another aspect, the invention features an expression system comprising an open 
reading frame corresponding to B. fragilis nucleic acid. The nucleic acid further 
comprises a control sequence compatible with an intended host. The expression system 
20 is useful for making polypeptides corresponding to B. fragilis nucleic acid. 

In another aspect, the invention encompasses: a vector including a nucleic acid 
which encodes an B. fragilis polypeptide or an B. fragilis polypeptide variant as 
described herein; a host cell transfected with the vector; and a method of producing a 
recombinant B. fragilis polypeptide or B. fragilis polypeptide variant; including culturing 
25 the cell, e.g., in a cell culture medium, and isolating the B. fragilis or 5. fragilis 
polypeptide variant, e.g., from the cell or from the cell culture medium. 

In yet another embodiment of the invention encompasses reagents for detecting 
bacterial infection, including B. fragilis infection, which comprise at least one B. fragilis 

-13- 
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-derived nucleic acid defined by any one of SEQ ID NO: 1 - SEQ ID NO: 5222, or 
sequence-conservative or function-conservative variants thereof. Alternatively, the 
diagnostic reagents comprise nucleotide sequences that are contained within any open 
reading frames (ORFs), including preferably complete protein-coding sequences, 
5 contained within any of SEQ ID NO: 1 - SEQ ID NO: 5222, or polypeptide sequences 
contained within any of SEQ ID NO: 5223 - SEQ ID NO: 10444, or polypeptides of 
which any of the above sequences forms a part, or antibodies directed against any of the 
above peptide sequences or function-conservative variants and/or fragments thereof. 
The invention further provides antibodies, preferably monoclonal antibodies, 

10 which specifically bind to the polypeptides of the invention. Methods are also provided 
for producing antibodies in a host animal. The methods of the invention comprise 
immunizing an animal with at least one B. fragilis -derived immunogenic component, 
wherein the immunogenic component comprises one or more of the polypeptides 
encoded by any one of SEQ ID NO: 1 - SEQ ID NO: 5222 or sequence-conservative or 

15 function-conservative variants thereof; or polypeptides that are contained within any 
ORFs, including complete protein-coding sequences, of which any of SEQ ID NO: 1 - 
SEQ ID NO: 5222 forms a part; or polypeptide sequences contained within any of SEQ 
ID NO: 5223 - SEQ ID NO: 10444; or polypeptides of which any of SEQ ID NO: 5223 - 
SEQ ID NO: 10444 forms a part. Host animals include any warm blooded animal, 

20 including without limitation mammals and birds. Such antibodies have utility as 
reagents for immunoassays to evaluate the abundance and distribution of B, fragilis - 
specific antigens. 

In yet another aspect, the invention provides diagnostic methods for detecting B. 
fragilis antigenic components or anti-i?. fragilis antibodies in a sample. B. fragilis 
25 antigenic components may be detected by known processes, including but not limited to 
detection by a process comprising: (i) contacting a sample suspected to contain a 
bacterial antigenic component with a bacterial-specific antibody, under conditions in 
which a stable antigen-antibody complex can form between the antibody and bacterial 
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antigenic components in the sample; and (ii) detecting any antigen-antibody complex 
formed in step (i), wherein detection of an antigen-antibody complex indicates the 
presence of at least one bacterial antigenic component in the sample. In different 
embodiments of this method, the antibodies used are directed against a sequence encoded 
5 by any of SEQ ID NO: 1 - SEQ ID NO: 5222 or sequence-conservative or function- 
conservative variants thereof, or against a polypeptide sequence contained in any of SEQ 
ID NO: 5223 - SEQ ID NO: 10444 or function-conservative variants thereof. 

In yet another aspect, the invention provides a method for detecting antibacterial- 
specific antibodies in a sample, which comprises: (i) contacting a sample suspected to 

10 contain antibacterial-specific antibodies with an B. fragilis antigenic component, under 
conditions in which a stable antigen-antibody complex can form between the B. fragilis 
antigenic component and antibacterial antibodies in the sample; and (ii) detecting any 
antigen- antibody complex formed in step (i), wherein detection of an antigen-antibody 
complex indicates the presence of antibacterial antibodies in the sample. In different 

15 embodiments of this method, the antigenic component is encoded by a sequence 
contained in any of SEQ ID NO: 1 - SEQ ID NO: 5222 or sequence-conservative and 
function-conservative variants thereof, or is a polypeptide sequence contained in any of 
SEQ ID NO: 5223 - SEQ ID NO: 1 0444 or function-conservative variants thereof 
In another aspect, the invention features a method of generating vaccines for 

20 immunizing an individual against B. fragilis . The method includes: immunizing a 
subject with an B. fragilis polypeptide, e.g., a surface or secreted polypeptide, or a 
combination of such peptides or active portion(s) thereof, and a pharmaceutical^ 
acceptable carrier. Such vaccines have therapeutic and prophylactic utilities. 

In another aspect, the invention features a method of evaluating a compound, e.g., 

25 a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an B. 

fragilis polypeptide. The method includes contacting the compound to be evaluated with 
an B. fragilis polypeptide and determining if the compound binds or otherwise interacts 
with the B. fragilis polypeptide. Compounds which bind or otherwise interact with B. 

-15- 



2709.1001-001 



ji n 
■Li i: 



fragilis polypeptides are candidates as modulators, including activators and inhibitors, of 
the bacterial life cycle. These assays can be performed in vitro or in vivo. 

In another aspect, the invention features a method of evaluating a compound, e.g., 
a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an B. 

5 fragilis nucleic acid, e.g., DNA or RNA. The method includes contacting the compound 
to be evaluated with an B. fragilis nucleic acid and determining if the compound binds or 
otherwise interacts with the B. fragilis nucleic acid. Compounds which bind B. fragilis 
are candidates as modultors, including activators and inhibitors, of the bacterial life 
cycle. These assays can be performed in vitro or in vivo, 

10 A particularly preferred embodiment of the invention is directed to a method of 

screening test compounds for anti-bacterial activity, which method comprises: selecting 
as a target a bacterial specific sequence, which sequence is essential to the viability of a 
bacterial species; contacting a test compound with said target sequence; and selecting 



f il those test compounds which bind to said target sequence as potential anti-bacterial 

Q 15 candidates. In one embodiment, the target sequence selected is specific to a single 

liff 

|3 species, or even a single strain, such as, for example, the strain B. fragilis 14062. In a 

^ second embodiment, the target sequence is common to at least two species of bacteria. 

m In a third embodiment, the target sequence is common to a family of bacteria. The target 

•fil JIT? > 

sequence may be a nucleic acid sequence or a polypeptide sequence. Methods employing 
20 sequences common to more than one species of microorganism may be used to screen 

candidates for broad spectrum anti-bacterial activity. 

The invention also provides methods for preventing or treating disease caused by 

certain bacteria, including B. fragilis , which are carried out by administering to an 

animal in need of such treatment, in particular a warm-blooded vertebrate, including but 
25 not limited to birds and mammals, a compound that specifically inhibits or interferes 

with the function of a bacterial polypeptide or nucleic acid. In a particularly preferred 

embodiment, the mammal to be treated is human. 
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DETAILED DESCRIPTION OF THE INVENTION 

The sequences of the present invention include the specific nucleic acid and 
amino acid sequences set forth in the Sequence Listing that forms a part of the present 
specification, and which are designated SEQ ID NO: 1 - SEQ ID NO: 10444. Use of the 
5 terms "SEQ ID NO: 1 - SEQ ID NO: 5222 " SEQ ID NO: 5223 - SEQ ID NO: 10444, 
"the sequences depicted in Table 2", etc., is intended, for convenience, to refer to each 
individual SEQ ID NO individually, and is not intended to refer to the genus of these 
sequences unless such reference would be indicated. In other words, it is a shorthand for 
listing all of these sequences individually. The invention encompasses each sequence 
10 individually, as well as any combination thereof. 

DEFINITIONS 

"Nucleic acid" or "polynucleotide" as used herein refers to purine- and 
pyrimidine-containing polymers of any length, either polyribonucleotides or 

15 polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This includes 
single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA 
hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating bases to an 
amino acid backbone. This also includes nucleic acids containing modified bases. 
A nucleic acid or polypeptide sequence that is "derived from" a designated 

20 sequence refers to a sequence that corresponds to a region of the designated sequence. 
For nucleic acid sequences, this encompasses sequences that are homologous or 
complementary to the sequence, as well as "sequence-conservative variants" and 
"function-conservative variants." For polypeptide sequences, this encompasses 
"function-conservative variants." Sequence-conservative variants are those in which a 

25 change of one or more nucleotides in a given codon position results in no alteration in the 
amino acid encoded at that position. Function-conservative variants are those in which a 
given amino acid residue in a polypeptide has been changed without altering the overall 
conformation and function of the native polypeptide, including, but not limited to, 
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replacement of an amino acid with one having similar physico-chemical properties (such 
as, for example, acidic, basic, hydrophobic, and the like). "Function-conservative" 
variants also include any polypeptides that have the ability to elicit antibodies specific to 
a designated polypeptide. 
5 An "A fragilis -derived" nucleic acid or polypeptide sequence may or may not be 

present in other bacterial species, and may or may not be present in all B. fragilis strains. 
This term is intended to refer to the source from which the sequence was originally 
isolated. Thus, an B, fragilis -derived polypeptide, as used herein, may be used, e.g., as a 
target to screen for a broad spectrum antibacterial agent, to search for homologous 
10 proteins in other species of bacteria or in eukaryotic organisms such asbacteria humans, 
etc. 

A purified or isolated polypeptide or a substantially pure preparation of a 
polypeptide are used interchangeably herein and, as used herein, mean a polypeptide that 
'|j has been separated from other proteins, lipids, and nucleic acids with which it naturally 

j,3 15 occurs. Preferably, the polypeptide is also separated from substances, e.g., antibodies or 

gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, the polypeptide 

nail 

;jl constitutes at least about 10, 20, 50 70, 80 or 95% dry weight of the purified preparation. 

Pi Preferably, the preparation contains sufficient polypeptide to allow protein sequencing; at 

' least about 1, 10, or preferably 100 mg of polypeptide. 

20 A purified preparation of cells refers to, in the case of plant or animal cells, an in 

vitro preparation of cells and not an entire intact plant or animal. In the case of cultured 
cells or microbial cells, it consists of a preparation of at least about 10%, more preferably 
at least about 50%, of the subject cells. 

A purified or isolated or a substantially pure nucleic acid, e.g., a substantially 
25 pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or both 
of the following: not immediately contiguous with both of the coding sequences with 
which it is immediately contiguous (i.e., one at the 5' end and one at the 3* end) in the 
naturally-occurring genome of the organism from which the nucleic acid is derived; or 

-18- 



which is substantially free of a nucleic acid with which it occurs in the organism from 
which the nucleic acid is derived. The term includes, for example, a recombinant DNA 
which is incorporated into a vector, e.g., into an autonomously replicating plasmid or 
virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a 
separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or 
restriction endonuclease treatment) independent of other DNA sequences. Substantially 
pure DNA also includes a recombinant DNA which is part of a hybrid gene encoding 
additional B. fragilis DNA sequence. 

A "contig" as used herein is a nucleic acid representing a continuous stretch of 
genomic sequence of an organism. 

An "open reading frame", also referred to herein as ORF, is a region of nucleic 
acid which encodes a polypeptide. This region may represent a portion of a coding 
sequence or a total sequence and can be determined from a stop to stop codon or from a 
start to stop codon. 

As used herein, a "coding sequence" is a nucleic acid which is transcribed into 
messenger RNA and/or translated into a polypeptide when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are determined 
by a translation start codon at the five prime terminus and a translation stop code at the 
three prime terminus. A coding sequence can include but is not limited to messenger 
RNA, synthetic DNA, and recombinant nucleic acid sequences. 

A "complement" of a nucleic acid as used herein refers to an anti-parallel or 
antisense sequence that participates in Watson-Crick base-pairing with the original 
sequence. 

A "gene product" is a protein or structural RNA which is specifically encoded by 

a gene. 

As used herein, the term "probe" refers to a nucleic acid, peptide or other 
chemical entity which specifically binds to a molecule of interest. Probes are often 
associated with or capable of associating with a label. A label is a chemical moiety 
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capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and 
chemiluminescent moieties, fluorophores, enzymes, precipitating agents, amplification 
sequences, and the like. Similarly, a nucleic acid, peptide or other chemical entity which 
specifically binds to a molecule of interest and immobilizes such molecule is referred 
5 herein as a "capture ligand". Capture ligands are typically associated with or capable of 
associating with a support such as nitro-cellulose, glass, nylon membranes, beads, 
particles and the like. The specificity of hybridization is dependent on conditions such as 
the base pair composition of the nucleotides, and the temperature and salt concentration 
of the reaction. These conditions are readily discernable to one of ordinary skill in the art 

10 using routine experimentation. 

"Homologous" refers to the sequence similarity or sequence identity between two 
polypeptides or between two nucleic acid molecules. When a position in both of the two 
compared sequences is occupied by the same base or amino acid monomer subunit, e.g., 
if a position in each of two DNA molecules is occupied by adenine, then the molecules 

15 are homologous at that position. The percent of homology between two sequences is a 
function of the number of matching or homologous positions shared by the two 
sequences divided by the number of positions compared x 100. For example, if 6 of 10 
of the positions in two sequences are matched or homologous then the two sequences are 
60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC 

20 share 50% homology. Generally, a comparison is made when two sequences are aligned 
to give maximum homology. 

Nucleic acids are hybridizable to each other when at least one strand of a nucleic 
acid can anneal to the other nucleic acid under defined stringency conditions. Stringency 
of hybridization is determined by: (a) the temperature at which hybridization and/or 

25 washing is performed; and (b) the ionic strength and polarity of the hybridization and 
washing solutions. Hybridization requires that the two nucleic acids contain 
complementary sequences; depending on the stringency of hybridization, however, 
mismatches may be tolerated. Typically, hybridization of two sequences at high 
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stringency (such as, for example, in a solution of 0.5X SSC, at 65° C) requires that the 
sequences be essentially completely homologous. Conditions of intermediate stringency 
(such as, for example, 2X SSC at 65 0 C) and low stringency (such as, for example 2X 
SSC at 55° C) require correspondingly less overall complementarity between the 
5 hybridizing sequences. (IX SSC is 0. 1 5 M NaCl, 0.01 5 M Na citrate). 

The terms peptides, proteins, and polypeptides are used interchangeably herein. 
As used herein, the term "surface protein" refers to all surface accessible proteins, 
e.g. inner and outer membrane proteins, proteins adhering to the cell wall, and secreted 
proteins. 

10 A polypeptide has B. fragilis biological activity if it has one, two or preferably 

more of the following properties: (1) if when expressed in the course of an B, fragilis 
infection, it can promote, or mediate the attachment of B. fragilis to a cell; (2) it has an 
enzymatic activity, structural or regulatory function characteristic of an B. fragilis 
protein; (3) the gene which encodes it can rescue a lethal mutation in an B. fragilis gene. 
^3 15 A polypeptide has biological activity if it is an antagonist, agonist, or super-agonist of a 

58' 

|;3 polypeptide having one of the above-listed properties. 

{3! S8 

|3 A biologically active fragment or analog is one having an in vivo or in vitro 

■ MS 
:«! UK ■ 

l ;] activity which is characteristic of the B, fragilis polypeptides of the invention contained 

in the Sequence Listing, or of other naturally occurring B. fragilis polypeptides, e.g., one 

20 or more of the biological activities described herein. Especially preferred are fragments 
which exist in vivo, e.g., fragments which arise from post transcriptional processing or 
which arise from translation of alternatively spliced RNA's. Fragments include those 
expressed in native or endogenous cells as well as those made in expression systems, 
e.g., in CHO (Chinese Hamster Ovary) cells. Because peptides such as B. fragilis 

25 polypeptides often exhibit a range of physiological properties and because such 

properties may be attributable to different portions of the molecule, a useful B. fragilis 
fragment or B. fragilis analog is one which exhibits a biological activity in any biological 
assay for B. fragilis activity. The fragment or analog possesses about 1 0%, preferably 
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about 40%, more preferably about 60%, 70%, 80% or 90% or greater of the activity of B, 
fragilis , in any in vivo or in vitro assay. 

Analogs can differ from naturally occurring B. fragilis polypeptides in amino acid 
sequence or in ways that do not involve sequence, or both. Non-sequence modifications 

5 include changes in acetylation, methylation, phosphorylation, carboxylation, or 

glycosylation. Preferred analogs include B. fragilis polypeptides (or biologically active 
fragments thereof) whose sequences differ from the wild-type sequence by one or more 
conservative amino acid substitutions or by one or more non-conservative amino acid 
substitutions, deletions, or insertions which do not substantially diminish the biological 

10 activity of the B. fragilis polypeptide. Conservative substitutions typically include the 
substitution of one amino acid for another with similar characteristics, e.g., substitutions 
within the following groups: valine, glycine; glycine, alanine; valine, isoleucine, leucine; 
aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine; 
and phenylalanine, tyrosine. Other conservative substitutions can be made in view of the 

15 table below. 
TABLE 1 



CONSERVATIVE AMINO AC] 


[D REPLACEMENTS 


For Amino Acid 


Code 


Replace with any of 


Alanine 


A 


D-Ala, Gly, beta-Ala, L-Cys, D-Cys 


Arginine 


R 


D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg, Met, He, 
D-Met, D-Ile, Orn, D-Orn 


Asparagine 


N 


D-Asn, Asp, D-Asp, Glu, D-Glu, Gin, D-Gln 


Aspartic Acid 


D 


D-Asp, D-Asn, Asn, Glu, D-Glu, Gin, D-Gln 


Cysteine 


C 


D-Cys, S-Me-Cys, Met, D-Met, Thr, D-Thr 


Glutamine 


Q 


D-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp 


Glutamic Acid 


E 


D-Glu, D-Asp, Asp, Asn, D-Asn, Gin, D-Gln 


Glycine 


G 


Ala, D-Ala, Pro, D-Pro, p-Ala, Acp 


Isoleucine 


I 


D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met 
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Leucine 


L 


D-Leu, Val, D-Val, Leu, D-Leu, Met, D-Met 


Lysine 


K 


D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg, Met, D- 
Met, He, D-Ile, Orn, D-Orn 


Methionine 


M 


D-Met, S-Me-Lys, lie, D-lle, Leu, D-Leu, Val, D-Val 


Phenylalanine 


F 


D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Trp, D-Trp, 
Trans-3,4, or 5-phenylproline, cis-3,4, or 5- 
phenylproline 


Proline 


P 


D-Pro, L-I-thioazolidine-4-carboxylic acid, D-or L-l- 
oxazolidine-4-carboxylic acid 


Serine 


S 


D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), 


Threonine 


T 


D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met, Met(O), 
D-Met(0), Val, D-Val 


Tyrosine 


Y 


D-Tyr, Phe, D-Phe, L-Dopa, His, D-His 


Valine 


V 


D-Val, Leu, D-Leu, He, D-Ile, Met, D-Met 



Other analogs within the invention are those with modifications which increase 
peptide stability; such analogs may contain, for example, one or more non-peptide bonds 
(which replace the peptide bonds) in the peptide sequence. Also included are: analogs 
5 that include residues other than naturally occurring L-amino acids, e.g., D-amino acids or 
non-naturally occurring or synthetic amino acids, e.g., p or y amino acids; and cyclic 
analogs. 

As used herein, the term "fragment", as applied to an B. fragilis analog, will 
ordinarily be at least about 20 residues, more typically at least about 40 residues, 

10 preferably at least about 60 residues in length. Fragments of B. fragilis polypeptides can 
be generated by methods known to those skilled in the art. The ability of an Bacteroides 
fragment to exhibit a biological activity of B. fragilis polypeptide can be assessed by 
methods known to those skilled in the art as described herein. Also included are B. 
fragilis polypeptides containing residues that are not required for biological activity of 

15 the peptide or that result from alternative mRNA splicing or alternative protein 
processing events. 
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An "immunogenic component" as used herein is a moiety, such as an B. fragilis 
polypeptide, analog or fragment thereof, that is capable of eliciting a humoral and/or 
cellular immune response in a host animal 

An "antigenic component" as used herein is a moiety, such as an B, fragilis 
5 polypeptide, analog or fragment thereof, that is capable of binding to a specific antibody 
with sufficiently high affinity to form a detectable antigen-antibody complex. 

The term "antibody" as used herein is intended to include fragments thereof 
which are specifically reactive with B. fragilis polypeptides. 

As used herein, the term "cell-specific promoter" means a DNA sequence that 
10 serves as a promoter, i.e., regulates expression of a selected DNA sequence operably 
linked to the promoter, and which effects expression of the selected DNA sequence in 
specific cells of a tissue. The term also covers so-called "leaky" promoters, which 
regulate expression of a selected DNA primarily in one tissue, but cause expression in 
5 y[ other tissues as well 

S -1? 15 Misexpression, as used herein, refers to a non-wild type pattern of gene 

Hi; 

C3 expression. It includes: expression at non-wild type levels, i.e., over or under expression; 

si is; 

a pattern of expression that differs from wild type in terms of the time or stage at which 
the gene is expressed, e.g., increased or decreased expression (as compared with wild 
type) at a predetermined developmental period or stage; a pattern of expression that 
20 differs from wild type in terms of increased expression (as compared with wild type) in a 
predetermined cell type or tissue type; a pattern of expression that differs from wild type 
in terms of the splicing size, amino acid sequence, post-translational modification, or 
biological activity of the expressed polypeptide; a pattern of expression that differs from 
wild type in terms of the effect of an environmental stimulus or extracellular stimulus on 
25 expression of the gene, e.g., a pattern of increased or decreased expression (as compared 
with wild type) in the presence of an increase or decrease in the strength of the stimulus. 

As used herein, "host cells" and other such terms denoting microorganisms or 
higher eukaryotic cell lines cultured as unicellular entities refers to cells which can 
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become or have been used as recipients for a recombinant vector or other transfer DNA, 
and include the progeny of the original cell which has been transfected. It is understood 
by individuals skilled in the art that the progeny of a single parental cell may not 
necessarily be completely identical in genomic or total DNA compliment to the original 
5 parent, due to accident or deliberate mutation. 

As used herein, the term "control sequence" refers to a nucleic acid having a base 
sequence which is recognized by the host organism to effect the expression of encoded 
sequences to which they are ligated. The nature of such control sequences differs 
depending upon the host organism; in prokaryotes, such control sequences generally 
10 include a promoter, ribosomal binding site, terminators, and in some cases operators; in 
fn eukaryotes, generally such control sequences include promoters, terminators and in some 

instances, enhancers. The term control sequence is intended to include at a minimum, all 
12 components whose presence is necessary for expression, and may also include additional 

components whose presence is advantageous, for example, leader sequences. 
15 As used herein, the term "operably linked" refers to sequences joined or ligated to 

function in their intended manner. For example, a control sequence is operably linked to 
coding sequence by ligation in such a way that expression of the coding sequence is 
achieved under conditions compatible with the control sequence and host cell. 

The "metabolism" of a substance, as used herein, means any aspect of the 
20 expression, function, action, or regulation of the substance. The metabolism of a 
substance includes modifications, e.g., covalent or non-covalent modifications of the 
substance. The metabolism of a substance includes modifications, e.g., covalent or non- 
covalent modification, the substance induces in other substances. The metabolism of a 
substance also includes changes in the distribution of the substance. The metabolism of a 
25 substance includes changes the substance induces in the distribution of other substances. 
A "sample" as used herein refers to a biological sample, such as, for example, 
tissue or fluid isloated from an individual (including without limitation plasma, serum, 
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cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture 
constituents, as well as samples from the environment. 

Technical and scientific terms used herein have the meanings commonly 
understood by one of ordinary skill in the art to which the present invention pertains, 
5 unless otherwise defined. Reference is made herein to various methodologies known to 
those of skill in the art. Publications and other materials setting forth such known 
methodologies to which reference is made are incorporated herein by reference in their 
entireties as though set forth in full. The practice of the invention will employ, unless 
otherwise indicated, conventional techniques of chemistry, molecular biology, 

10 microbiology, recombinant DNA, and immunology, which are within the skill of the art. 
Such techniques are explained fully in the literature. See e.g., Sambrook, Fritsch, and 
Maniatis, Molecular Cloning; Laboratory Manual 2nd ed. (1989); DNA Cloning, 
Volumes I and II (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); 
Nucleic Acid Hybridization (B.D. Hames & S J. Higgins eds. 1984); the series, Methods 

15 in Enzymoloqy (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 (Wu and 
Grossman, eds.); PCR-A Practical Approach (McPherson, Quirke, and Taylor, eds. ? 
1991); Immunology, 2d Edition, 1989, Roitt et al, C.V. Mosby Company, and New 
York; Advanced Immunology, 2d Edition, 1991, Male et ah, Grower Medical Publishing, 
New York.; DNA Cloning: A Practical Approach, Volumes I and II, 1985 (D.N. Glover 

20 ed.); Oligonucleotide Synthesis, 1984, (M.L. Gait ed); Transcription and Translation, 
1984 (Hames and Higgins eds.); Animal Cell Culture, 1986 (R.L Freshney ed.); 
Immobilized Cells and Enzymes, 1986 (IRL Press); Perbal, 1984, A Practical Guide to 
Molecular Cloning; Gene Transfer Vectors for Mammalian Cells, 1987 (J. H. Miller and 
M. P. Calos eds., Cold Spring Harbor Laboratory); Martin J. Bishop, ed., Guide to 

25 Human Genome Computing, 2d Edition, Academic Press, San Diego, CA. (1998); and 
Leonard F. Peruski, Jr., and Anne Harwood Peruski, The Internet and the New Biology: 
Tools for Genomic and Molecular Research, American Society for Microbiology, 
Washington, D.C. (1997). 
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Any suitable materials and/or methods known to those of skill can be utilized in 
carrying out the present invention; however, preferred materials and/or methods are 
described. Materials, reagents and the like to which reference is made in the following 
description and examples are obtainable from commercial sources, unless otherwise 
5 noted. 



B. FRAGILIS GENOMIC SEQUENCE 

This invention provides nucleotide sequences of the genome of B. fragilis which 
thus comprises a DNA sequence library of B. fragilis genomic DNA. The detailed 

10 description that follows provides nucleotide sequences of B. fragilis , and also describes 
how the sequences were obtained and how ORFs and protein-coding sequences were 
identified. Also described are compositions and methods of using the disclosed B. 
fragilis sequences in methods including diagnostic and therapeutic applications. 
Furthermore, the library can be used as a database for identification and comparison of 

15 medically important sequences in this and other strains of B, fragilis . 

To determine the genomic sequence of B. fragilis , DNA from strain 14062 of B, 
fragilis was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, potassium 
acetate precipitation, phenol: chloroform extractionand ethanol precipitation (Soli, D.R., 
T. Srikantha and S.R. Lockhart: Characterizing Developmentally Regulated Genes in B. 

20 fragilis . In Microbial Genome Methods. K.W. Adolph, editor. CRC Press. New York, 
p 17-37.). DNA was sheared hydrodynamically using an HPLC (Oefner, et. al, 1996) to 
an insert size of 2000-3000 bp. After size fractionation by gel electrophoresis the 
fragments were blunt-ended, ligated to adapter oligonucleotides and cloned into the 
pGTC (Thomann) vector to construct a "shotgun" subclone library. 

25 DNA sequencing was achieved using established ABI sequencing methods on 

ABI377 automated DNA sequencers. The cloning and sequencing procedures are 
described in more detail in the Exemplification. 
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Individual sequence reads were assembled using PHRAP (P. Green, Abstracts of 
DOE Human Genome Program Contractor-Grantee Workshop V, Jan. 1996, p. 157). The 
average contig length was about 3-4 kb. 

All subsequent steps were based on sequencing by ABB 77 automated DNA 
5 sequencing methods. The cloning and sequencing procedures are described in more 
detail in the Exemplification. 

A variety of approaches may be used to order the contigs so as to obtain a 
continuous sequence representing the entire B. fragilis genome. Synthetic 
oligonucleotides are designed that are complementary to sequences at the end of each 
10 contig. These oligonucleotides may be hybridized to libaries of B. fragilis genomic DNA 
in, for example, lambda phage vectors or plasmid vectors to identify clones that contain 
sequences corresponding to the junctional regions between individual contigs. Such 
**- ( clones are then used to isolate template DNA and the same oligonucleotides are used as 

I ' w i primers in polymerase chain reaction (PCR) to amplify junctional fragments, the 
u 15 nucleotide sequence of which is then determined. 

U The B, fragilis sequences were analyzed for the presence of open reading frames 

r: (ORFs) comprising at least 1 80 nucleotides. As a result of the analysis of ORFs based on 

sis 

: stop-to-stop codon reads, it should be understood that these ORFs may not correspond to 

the ORF of a naturally-occurring B. fragilis polypeptide. These ORFs may contain start 

20 codons which indicate the initiation of protein synthesis of a naturally-occurring B. 

fragilis polypeptide. Such start codons within the ORFs provided herein were identified 
by those of ordinary skill in the relevant art, and the resulting ORF and the encoded B. 
fragilis polypeptide is within the scope of this invention. For example, within the ORFs 
a codon such as AUG or GUG (encoding methionine or valine) which is part of the 

25 initiation signal for protein synthesis were identified and the portion of an ORF to 
corresponding to a naturally-occurring B. fragilis polypeptide was recognized. The 
predicted coding regions were defined by evaluating the coding potential of such 
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sequences with the program GENEMARK™ (Borodovsky and Mclninch, 1993, Comp, . 
17:123). 

Each predicted ORF amino acid sequence was compared with all sequences 
found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST 
5 algorithm. BLAST identifies local alignments occurring by chance between the ORF 
sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 215:403- 
410). Homologous ORFs (probabilities less than 10" 5 by chance) andORFs that are 
probably non-homologous (probabilities greater than 10" 5 by chance) but have good 
codon usage were identified. Both homologous, sequences and non-homologous 
10 sequences with good codon usage, are likely to encode proteins and are encompassed by 
the invention. 

B. FRAGILIS NUCLEIC ACIDS 

The present invention provides a library of B. fragilis -derived nucleic acid 

15 sequences. The libraries provide probes, primers, and markers which are used as markers 
in epidemiological studies. The present invention also provides a library of B, fragilis - 
derived nucleic acid sequences which comprise or encode targets for therapeutic drugs. 

The nucleic acids of this invention may be obtained directly from the DNA of the 
above referenced B, fragilis strain by using the polymerase chain reaction (PCR). See 

20 "PCR, A Practical Approach" (McPherson, Quirke, and Taylor, eds., IRL Press, Oxford, 
UK, 1991) for details about the PCR. High fidelity PCRis used to ensure a faithful DNA 
copy prior to expression. In addition, the authenticity of amplified products is verified by 
conventional sequencing methods. Clones carrying the desired sequences described in 
this invention may also be obtained by screening the libraries by means of the PCR or by 

25 hybridization of synthetic oligonucleotide probes to filter lifts of the library colonies or 
plaques as known in the art (see, e.g., Sambrook et al, Molecular Cloning, A Laboratory 
Manual 2nd edition, 1 989, Cold Spring Harbor Press, NY). 
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It is also possible to obtain nucleic acids encoding B. fragilis polypeptides from a 
cDNA library in accordance with protocols herein described. A cDNA encoding an B. 
fragilis polypeptide can be obtained by isolating total mRNA from an appropriate strain. 
Double stranded cDNAs can then be prepared from the total mRNA. Subsequently, the 
5 cDNAs can be inserted into a suitable plasmid or viral (e.g., bacteriophage) vector using 
any one of a number of known techniques. Genes encoding B. fragilis polypeptides can 
also be cloned using established polymerase chain reaction techniques in accordance with 
the nucleotide sequence information provided by the invention. The nucleic acids of the 
invention can be DNA or RNA. Preferred nucleic acids of the invention are contained in 
10 the Sequence Listing. 

The nucleic acids of the invention can also be chemically synthesized using 
standard techniques. Various methods of chemically synthesizing polydeoxynucleotides 

I,! jj 

are known, including solid-phase synthesis which, like peptide synthesis, has been fully 

h la 1 

[i! automated in commercially available DNA synthesizers (See e.g., Itakura et al. U.S. 

y 15 Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 4,458,066; and Itakura U.S. Patent 

C Nos, 4,401,796 and 4,373,071, incorporated by reference herein). 

Mi 

{,;;; In another example, DNA can be chemically synthesized using, e.g., the 

P phosphoramidite solid support method of Matteucci et al, 1981, 1 A m. Chem. Soc. 

%rf 103:3185, the method of Yoo et al, 1989, 1 Biol Chem. 764:17078, or other well 

20 known methods. This can be done by sequentially linking a series of oligonucleotide 
cassettes comprising pairs of synthetic oligonucleotides, as described below. 

Nucleic acids isolated or synthesized in accordance with features of the present 
invention are useful, by way of example, without limitation, as probes, primers, capture 
ligands, antisense genes and for developing expression systems for the synthesis of 
25 proteins and peptides corresponding to such sequences. As probes, primers, capture 
ligands and antisense agents, the nucleic acid normally consists of all or part 
(approximately twenty or more nucleotides for specificity as well as the ability to form 
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stable hybridization products) of the nucleic acids of the invention contained in the 
Sequence Listing. These uses are described in further detail below. 



u 
p 

1 ¥ 



PROBES 

5 A nucleic acid isolated or synthesized in accordance with the sequence of the 

invention contained in the Sequence Listing can be used as a probe to specifically detect 
B, fragilis . With the sequence information set forth in the present application, sequences 
of twenty or more nucleotides are identified which provide the desired inclusivity and 
exclusivity with respect to B. fragilis , and extraneous nucleic acids likely to be 
10 encountered during hybridization conditions. More preferably, the sequence will 
comprise at least about twenty to thirty nucleotides to convey stability to the 
hybridization product formed between the probe and the intended target molecules. 

Sequences larger than 1000 nucleotides in length are difficult to synthesize but 
can be generated by recombinant DNA techniques. Individuals skilled in the art will 

• % 

w 15 readily recognize that the nucleic acids, for use as probes, can be provided with a label to 

B 

W facilitate detection of a hybridization product. 

P Nucleic acid isolated and synthesized in accordance with the sequence of the 

13 invention contained in the Sequence Listing can also be useful as probes to detect 

homologous regions (especially homologous genes) of other Bacteroides species using 
20 appropriate stringency hybridization conditions as described herein. 

CAPTURE LIGAND i 
For use as a capture ligand, the nucleic acid selected in the manner described 
above with respect to probes, can be readily associated with a support. The manner in 
25 which nucleic acid is associated with supports is well known. Nucleic acid having 
twenty or more nucleotides in a sequence of the invention contained in the Sequence 
Listing have utility to separate B. fragilis nucleic acid from one strain from the nucleic 
acid of other another strain as well as from other organisms. Nucleic acid having twenty 
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or more nucleotides in a sequence of the invention contained in the Sequence Listing can 
also have utility to separate other Bacteroides species from each other and from other 
organisms. Preferably, the sequence will comprise at least about twenty nucleotides to 
convey stability to the hybridization product formed between the probe and the intended 
5 target molecules. Sequences larger than 1000 nucleotides in length are difficult to 
synthesize but can be generated by recombinant DNA techniques. 



PRIMERS 

Nucleic acid isolated or synthesized in accordance with the sequences described 
10 herein have utility as primers for the amplification of B, fragilis nucleic acid. These 
p nucleic acids may also have utility as primers for the amplification of nucleic acids in 

IS 

other Bacteroides species. With respect to polymerase chain reaction (PCR) techniques, 

™ nucleic acid sequences of > 10-15 nucleotides of the invention contained in the Sequence 

W 

[| Listing have utility in conjunction with suitable enzymes and reagents to create copies of 

U 

^3 15 B. fragilis nucleic acid. More preferably, the sequence will comprise twenty or more 

Si 

O nucleotides to convey stability to the hybridization product formed between the primer 

O and the intended target molecules. Binding conditions of primers greater than 100 

p nucleotides are more difficult to control to obtain specificity. High fidelity PCR can be 

used to ensure a faithful DNA copy prior to expression. In addition, amplified products 
20 can be checked by conventional sequencing methods. 

The copies can be used in diagnostic assays to detect specific sequences, 
including genes from B. fragilis and/or other Bacteroides species. The copies can also 
be incorporated into cloning and expression vectors to generate polypeptides 
corresponding to the nucleic acid synthesized by PCR, as is described in greater detail 
25 herein. 

The nucleic acids of the present invention find use as templates for the 
recombinant production of B. fragilis -derived peptides or polypeptides 
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ANTISENSE 

Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in 
accordance with the sequences described herein have utility as antisense agents to 
prevent the expression of B. fragilis genes. These sequences also have utility as 
5 antisense agents to prevent expression of genes of other Bacteroides species. 

In one embodiment, nucleic acid or derivatives corresponding to B. fragilis 
nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for 
introduction into bacterial cells. For example, a nucleic acid having twenty or more 
nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger RNA. 
10 Preferably, the antisense nucleic acid is comprised of 20 or more nucleotides to provide 
necessary stability of a hybridization product of non-naturally occurring nucleic acid and 
bacterial nucleic acid and/or bacterial messenger RNA. Nucleic acid having a sequence 
greater than 1000 nucleotides in length is difficult to synthesize but can be generated by 
recombinant DNA techniques. Methods for loading antisense nucleic acid in liposomes 
15 is known in the art as exemplified by U.S. Patent 4,241 ,046 issued December 23, 1 980 to 
Papahadjopoulos et al. 

The present invention encompasses isolated polypeptides and nucleic acids 
derived from B. fragilis that are useful as reagents for diagnosis of bacterial infection, 
components of effective anti-bacterial vaccines, and/or as targets for anti-bacterial drugs, 
20 including anti-5. fragilis drugs. 

EXPRESSION OF B. FRAGILIS NUCLEIC ACIDS 

Table 2, which is appended herewith and which forms part of the present 
specification, provides a list of open reading frames (ORFs) in both strands and a 
25 putative identification of the particular function of a polypeptide which is encoded by 
each ORF, based on the homology match (determined by the BLASTP2 algorithm) of the 
predicted polypeptide with known proteins encoded by ORFs in other organisms. An 
ORF is a region of nucleic acid which encodes a polypeptide. This region may represent 
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a portion of a coding sequence or a total sequence and was determined from stop to stop 
codons. The first column contains a designation for the ORF ("ORF Name"). The second 
and third columns list the SEQ ID numbers for the nucleic acid ("NT ID") and amino 
acid ("AA ID") sequences corresponding to each ORF, respectively. The fourth and fifth 
5 columns list the length of the nucleic acid ORF ("NT Length") and the length of the 
amino acid ORF ("AA Length "), respectively. The nucleotide sequence corresponding 
to each ORF begins at the first nucleotide immediately following a stop codon and ends 
at the nucleotide immediately preceding the next downstream stop codon in the same 
reading frame. It will be recognized by one skilled in the art that the natural translation 
10 initiation sites will correspond to ATG, GTG, or TTG codons located within the ORFs. 
PI The natural initiation sites depend not only on the sequence of a start codon but also on 

the context of the DNA sequence adjacent to the start codon. Usually, a recognizable 
ribosome binding site is found within 20 nucleotides upstream from the initiation codon. 
In some cases where genes are translationally coupled and coordinately expressed 
U 15 together in "operons", ribosome binding sites are not present, but the initiation codon of a 

downstream gene may occur very close to, or overlap, the stop codon of the an upstream 
gene in the same operon. The correct start codons can be generally identified without 
undue experimentation because only a few codons need be tested. It is recognized that 
the translational machinery in bacteria initiates all polypeptide chains with the amino 
20 acid methionine, regardless of the sequence of the start codon. In some cases, 

polypeptides are post-translationally modified, resulting in an N-terminal amino acid 
other than methionine in vivo. The sixth and seventh columns provide metrics for 
assessing the likelihood of the homology match (determined by the BLASTP2 
algorithm), as is known in the art, to the genes indicated in the description frame 
25 ("Description") defined further below. These genes in the Description were identified 
when the designated ORF was compared against a comprehensive non-redundant protein 
database. Specifically, the sixth column represents the Blast Score ("Score") for the 
match (a higher score is a better match), and the seventh column represents the 
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probability ("Probability") for the match (the probability that such a match can have 
occurred by chance; the lower the value, the more likely the match is valid). If a 
BLASTP2 score of less than 100 was obtained, no value is reported in the table. The 
remaining fields below the columns contain additional information relating to the 
5 potential function of the sequence based on the BLASTP2 analysis. Where a match was 
discovered, the field "Protein name" list the protein's name identified from the match. In 
addition, one skilled in the art would be able to identify the match and elucidate its 
function using the "Locus name" and where available the accession number, "Acc#" from 
the database. Lastly, one skilled in the art would appreciate the "Description" field to 

10 further describe the potential function of the protein based on this analysis. This 

information allows one of ordinary skill in the art to determine a potential use for each 
identified coding sequence and, as a result, allows to use the polypeptides of the present 
invention for commercial and industrial purposes. 

Using the information provided in SEQ ID NO: 1 - SEQ ID NO: 5222, SEQ ID 

15 NO: 5223 - SEQ ID NO: 10444 and in Table 2 together with routine cloning and 

sequencing methods, one of ordinary skill in the art will be able to clone and sequence all 
the nucleic acid fragments of interest including open reading frames (ORFs) encoding a 
large variety of proteins of B. fragilis . 

Nucleic acid isolated or synthesized in accordance with the sequences described 

20 herein have utility to generate polypeptides. The nucleic acid of the invention 

exemplified in SEQ ID NO: 1 - SEQ ID NO: 5222 and in Table 2 or fragments of said 
nucleic acid encoding active portions of B. fragilis polypeptides can be cloned into 
suitable vectors or used to isolate nucleic acid. The isolated nucleic acid is combined 
with suitable DNA linkers and cloned into a suitable vector. 

25 The function of a specific gene or operon can be ascertained by expression in a 

bacterial strain under conditions where the activity of the gene product(s) specified by the 
gene or operon in question can be specifically measured. Alternatively, a gene product 
may be produced in large quantities in an expressing strain for use as an antigen, an 
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industrial reagent, for structural studies, etc. This expression can be accomplished in a 
mutant strain which lacks the activity of the gene to be tested, or in a strain that does not 
produce the same gene product(s). This includes, but is not limited to, Eucaryotic species 
such as the yeast Saccharomyces cerevisiae, Methanobacterium strains or other Archaea, 
5 and Eubacteria such as E. coli, B. Subtilis, S. Aureus, S. Pneumonia or Pseudomonas 
putida. In some cases the expression host will utilize the natural B. fragilis promoter 
whereas in others, it will be necessary to drive the gene with a promoter sequence 
derived from the expressing organism (e.g., an E. coli beta-galactosidase promoter for 
expression in E. coli). 

10 To express a gene product using the natural B. fragilis promoter, a procedure such 

q as the following can be used. A restriction fragment containing the gene of interest, 

together with its associated natural promoter element and regulatory sequences 
(identified using the DNA sequence data) is cloned into an appropriate recombinant 
plasmid containing an origin of replication that functions in the host organism and an 
™ 15 appropriate selectable marker. This can be accomplished by a number of procedures 

known to those skilled in the art. It is most preferably done by cutting the plasmid and 
the fragment to be cloned with the same restriction enzyme to produce compatible ends 
that can be ligated to join the two pieces together. The recombinant plasmid is 
introduced into the host organism by, for example, electroporation and cells containing 
20 the recombinant plasmid are identified by selection for the marker on the plasmid. 

Expression of the desired gene product is detected using an assay specific for that gene 
product. 

In the case of a gene that requires a different promoter, the body of the gene 
(coding sequence) is specifically excised and cloned into an appropriate expression 
25 plasmid. This subcloning can be done by several methods, but is most easily 
accomplished by PCR amplification of a specific fragment and ligation into an 
expression plasmid after treating the PCR product with a restriction enzyme or 
exonuclease to create suitable ends for cloning. 
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A suitable host cell for expression of a gene can be any procaryotic or eucaryotic 
cell. Suitable methods for transforming host cells can be found in Sambrook et al. 
(Molecular Cloning: A Laboratory Manual 2nd Edition, Cold Spring Harbor Laboratory 
Press (1989)), and other laboratory textbooks. 
5 For example, a host cell transfected with a nucleic acid vector directing 

expression of a nucleotide sequence encoding an B. fragilis polypeptide can be cultured 
under appropriate conditions to allow expression of the polypeptide to occur. Suitable 
media for cell culture are well known in the art. Polypeptides of the invention can be 
isolated from cell culture medium, host cells, or both using techniques known in the art 
10 for purifying proteins including ion-exchange chromatography, gel filtration 
^ chromatography, ultrafiltration, electrophoresis, and immunoaffmity purification with 

5 ii is? 

I r i, antibodies specific for such polypeptides. Additionally, in many situations, polypeptides 
J."; can be produced by chemical cleavage of a native protein (e.g., tryptic digestion) and the 
\ ;f cleavage products can then be purified by standard techniques. 

w 15 In the case of membrane bound proteins, these can be isolated from a host cell by 

s . 

contacting a membrane-associated protein fraction with a detergent forming a solubilized 

13! ass 

:;;f. complex, where the membrane-associated protein is no longer entirely embedded in the 

II membrane fraction and is solubilized at least to an extent which allows it to be 
chromatographically isolated from the membrane fraction. Chromatographic techniques 

20 which can be used in the final purification step are known in the art and include 

hydrophobic interaction, lectin affinity, ion exchange, dye affinity and immunoaffinity. 

One strategy to maximize recombinant B. fragilis peptide expression in E. coli is 
to express the protein in a host bacteria with an impaired capacity to proteolytically 
cleave the recombinant protein (Gottesman, S. ; Gene Expression Technology: Methods 

25 in Enzvmology 185 , Academic Press, San Diego, California (1 990) 1 1 9-128). Another 
strategy would be to alter the nucleic acid encoding an B, fragilis peptide to be inserted 
into an expression vector so that the individual codons for each amino acid would be 
those preferentially utilized in highly expressed E. coli proteins (Wada et al., (1992) Nuc. 
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.4c*Vfo 20:21 1 1-21 18). Such alteration of nucleic acids of the invention can be 
carried out by standard DNA synthesis techniques. 

The nucleic acids of the invention can also be chemically synthesized using 
standard techniques. Various methods of chemically synthesizing polydeoxynucleotides 
5 are known, including solid-phase synthesis which, like peptide synthesis, has been fully 
automated in commercially available DNA synthesizers (See, e.g., Itakura et al. U.S. 
Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 4,458,066; and Itakura U.S. Patent 
Nos. 4,401,796 and 4,373,071, incorporated by reference herein). 

The present invention provides a library of B. fragilis -derived nucleic acid 
10 sequences. The libraries provide probes, primers, and markers which can be used as 
markers in epidemiological studies. The present invention also provides a library of B. 
fragilis -derived nucleic acid sequences which comprise or encode targets for therapeutic 
drugs. 

Nucleic acids comprising any of the sequences disclosed herein or sub-sequences 
15 thereof can be prepared by standard methods using the nucleic acid sequence information 
provided in SEQ ID NO: 1 - SEQ ID NO: 5222. For example, DNA can be chemically 
synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al, 
1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al, 1989, 1 Biol Chem. 
764: 1 7078, or other well known methods. This can be done by sequentially linking a 
20 series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as 
described below. 

Of course, due to the degeneracy of the genetic code, many different nucleotide 
sequences can encode polypeptides having the amino acid sequences defined by SEQ ID 
NO: 5223 - SEQ ID NO: 10444 or sub-sequences thereof. The codons can be selected 
25 for optimal expression in prokaryotic or eukaryotic systems. Such degenerate variants 
are also encompassed by this invention. 

Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the 
invention into a vector is easily accomplished when the termini of both the DNAs and the 
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vector comprise compatible restriction sites. If this cannot be done, it may be necessary 
to modify the termini of the DNAs and/or vector by digesting back single-stranded DNA 
overhangs generated by restriction endonuclease cleavage to produce blunt ends, or to 
achieve the same result by filling in the single-stranded termini with an appropriate DNA 
5 polymerase. 

Alternatively, any site desired may be produced, e.g., by ligating nucleotide 
sequences (linkers) onto the termini. Such linkers may comprise specific oligonucleotide 
sequences that define desired restriction sites. Restriction sites can also be generated by 
the use of the polymerase chain reaction (PCR). See, e.g., Saiki et al. 9 1988, Science 
10 239:48. The cleaved vector and the DNA fragments may also be modified if required by 
homopolymeric tailing. 

The nucleic acids of the invention may be isolated directly from cells. 
Alternatively, the polymerase chain reaction (PCR) method can be used to produce the 
nucleic acids of the invention, using either chemically synthesized strands or genomic 
15 material as templates. Primers used for PCR can be synthesized using the sequence 
information provided herein and can further be designed to introduce appropriate new 
restriction sites, if desirable, to facilitate incorporation into a given vector for 
recombinant expression. 

The nucleic acids of the present invention may be flanked by natural B, fragilis 
20 regulatory sequences, or may be associated with heterologous sequences, including 
promoters, enhancers, response elements, signal sequences, polyadenylation sequences, 
introns, 5'- and 3'- noncoding regions, and the like. The nucleic acids may also be 
modified by many means known in the art. Non-limiting examples of such modifications 
include methylation, "caps", substitution of one or more of the naturally occurring 
25 nucleotides with an analog, internucleotide modifications such as, for example, those 
with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, 
phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, 
phosphorodithioates, etc.). Nucleic acids may contain one or more additional covalently 
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linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal 
peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., 
metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. PNAs are also 
included. The nucleic acid may be derivatized by formation of a methyl or ethyl 
5 phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the nucleic acid 
sequences of the present invention may also be modified with a label capable of 
providing a detectable signal, either directly or indirectly. Exemplary labels include 
radioisotopes, fluorescent molecules, biotin, and the like. 

The invention also provides nucleic acid vectors comprising the disclosed B, 

10 fragilis -derived sequences or derivatives or fragments thereof. A large number of 
vectors, including plasmid and bacterial vectors, have been described for replication 
and/or expression in a variety of eukaryotic and prokaryotic hosts, and may be used for 
cloning or protein expression. 

The encoded B. fragilis polypeptides may be expressed by using many known 

15 vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, WI), or pRSET 
or pREP (Invitrogen, San Diego, CA), and many appropriate host cells, using methods 
disclosed or cited herein or otherwise known to those skilled in the relevant art. The 
particular choice of vector/host is not critical to the practice of the invention. 

Recombinant cloning vectors will often include one or more replication systems 

20 for cloning or expression, one or more markers for selection in the host, e.g. antibiotic 
resistance, and one or more expression cassettes. The inserted B. fragilis coding 
sequences may be synthesized by standard methods, isolated from natural sources, or 
prepared as hybrids, etc. Ligation of the B. fragilis coding sequences to transcriptional 
regulatory elements and/or to other amino acid coding sequences may be achieved by 

25 known methods. Suitable host cells may be transformed/transfected/infected as 
appropriate by any suitable method including electroporation, CaCh mediated DNA 
uptake, bacterial infection, microinjection, microprojectile, or other established methods. 
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Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, and 
plant and animal cells, especially mammalian cells. Of particular interest are B. fragilis , 
E. coli, B, Subtilis, Saccharomyces cerevisiae, Saccharomyces carlsbergensis, 
Schizosaccharomyces pornbi, SF9 cells, CI 29 cells, 293 cells, Neurospora, and CHO 
5 cells, COS cells, HeLa cells, and immortalized mammalian myeloid and lymphoid cell 
lines. Preferred replication systems include M13, ColEl, SV40, baculovirus, lambda, 
adenovirus, and the like. A large number of transcription initiation and termination 
regulatory regions have been isolated and shown to be effective in the transcription and 
translation of heterologous proteins in the various hosts. Examples of these regions, 
10 methods of isolation, manner of manipulation, etc. are known in the art. Under 

appropriate expression conditions, host cells can be used as a source of recombinantly 
i% produced B. fragilis -derived peptides and polypeptides. 

Advantageously, vectors may also include a transcription regulatory element (i.e., 
V;: a promoter) operably linked to the B. fragilis portion. The promoter may optionally 

^3 15 contain operator portions and/or ribosome binding sites. Non-limiting examples of 

0 bacterial promoters compatible with E. coli include: b-lactamase (penicillinase) 

Q promoter; lactose promoter; tryptophan (trp) promoter; araBAD (arabinose) operon 

p promoter; lambda-derived ?\ promoter and N gene ribosome binding site; and the hybrid 

p 

tac promoter derived from sequences of the trp and lac UV5 promoters. Non-limiting 
20 examples of yeast promoters include 3-phosphoglycerate kinase promoter, 

glyceraldehyde-3 -phosphate dehydrogenase (GAPDH) promoter, galactokinase (GAL1) 
promoter, galactoepimerase promoter, and alcohol dehydrogenase (ADH) promoter. 
Suitable promoters for mammalian cells include without limitation viral promoters such 
as that from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), adenovirus (ADV), 
25 and bovine papilloma virus (BPV). Mammalian cells may also require terminator 
sequences, polyA addition sequences and enhancer sequences to increase expression. 
Sequences which cause amplification of the gene may also be desirable. Furthermore, 
sequences that facilitate secretion of the recombinant product from cells, including, but 
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not limited to, bacteria, yeast, and animal cells, such as secretory signal sequences and/or 
prohormone pro region sequences, may also be included. These sequences are well 
described in the art. 

Nucleic acids encoding wild-type or variant B. fragilis -derived polypeptides may 
5 also be introduced into cells by recombination events. For example, such a sequence can 
be introduced into a cell, and thereby effect homologous recombination at the site of an 
endogenous gene or a sequence with substantial identity to the gene. Other 
recombination-based methods such as nonhomologous recombinations or deletion of 
endogenous genes by homologous recombination may also be used. 
10 The nucleic acids of the present invention find use as templates for the 

recombinant production of B. fragilis -derived peptides or polypeptides. 

IDENTIFICATION AND USE OF B. FRAGILIS NUCLEIC ACID SEQUENCES 
The disclosed B. fragilis polypeptide and nucleic acid sequences, or other 

15 sequences that are contained within ORFs, including complete protein-coding sequences, 
of which any of the disclosed B. fragilis -specific sequences forms a part, are useful as 
target components for diagnosis and/or treatment of B, fragilis - caused infection 

It will be understood that the sequence of an entire protein-coding sequence of 
which each disclosed nucleic acid sequence forms a part can be isolated and identified 

20 based on each disclosed sequence. This can be achieved, for example, by using an 
isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime a 
sequencing reaction with genomic B. fragilis DNA as template; this is followed by 
sequencing the amplified product. The isolated nucleic acid encoding the disclosed 
sequence, or fragments thereof, can also be hybridized to B. fragilis genomic libraries to 

25 identify clones containing additional complete segments of the protein-coding sequence 
of which the shorter sequence forms a part. Then, the entire protein-coding sequence, or 
fragments thereof, or nucleic acids encoding all or part of the sequence, or sequence- 
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conservative or function-conservative variants thereof, may be employed in practicing 
the present invention. 

Preferred sequences are those that are useful in diagnostic and/or therapeutic 
applications. Diagnostic applications include without limitation nucleic-acid-based and 
5 antibody-based methods for detecting bacterial infection. Therapeutic applications 
include without limitation vaccines, passive immunotherapy, and drug treatments 
directed against gene products that are both unique to bacteria and essential for growth 
and/or replication of bacteria. 



10 IDENTIFICATION OF NUCLEIC ACIDS ENCODING VACCINE COMPONENTS 
AND TARGETS FOR AGENTS EFFECTIVE AGAINST B. FRAGILIS 

The disclosed B. fragilis genome sequence includes segments that direct the 
synthesis of ribonucleic acids and polypeptides, as well as origins of replication, 
promoters, other types of regulatory sequences, and intergenic nucleic acids. The 

15 invention encompasses nucleic acids encoding immunogenic components of vaccines 
and targets for agents effective against B. fragilis . Identification of said immunogenic 
components involved in the determination of the function of the disclosed sequences, 
which can be achieved using a variety of approaches. Non-limiting examples of these 
approaches are described briefly below. 

20 

HOMOLOGY TO KNOWN SEQUENCES: 

Computer-assisted comparison of the disclosed B. fragilis sequences with 
previously reported sequences present in publicly available databases is useful for 
identifying functional B. fragilis nucleic acid and polypeptide sequences. It will be 
25 understood that protein-coding sequences, for example, may be compared as a whole, 
and that a high degree of sequence homology between two proteins (such as, for 
example, >80-90%) at the amino acid level indicates that the two proteins also possess 
some degree of functional homology, such as, for example, among enzymes involved in 
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metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in transport, 
cell division, etc. In addition, many structural features of particular protein classes have 
been identified and correlate with specific consensus sequences, such as, for example, 
binding domains for nucleotides, DNA, metal ions, and other small molecules; sites for 
5 covalent modifications such as phosphorylation, acylation, and the like; sites of 

proteimprotein interactions, etc. These consensus sequences may be quite short and thus 
may represent only a fraction of the entire protein-coding sequence, Identification of 
such a feature in an B. fragilis sequence is therefore useful in determining the function of 
the encoded protein and identifying useful targets of antibacterial drugs. 

10 Of particular relevance to the present invention are structural features that are 

common to secretory, transmembrane, and surface proteins, including secretion signal 
peptides and hydrophobic transmembrane domains. B. fragilis proteins identified as 
containing putative signal sequences and/or transmembrane domains are useful as 
immunogenic components of vaccines. 

15 Targets for therapeutic drugs according to the invention include, but are not 

limited to, polypeptides of the invention, whether unique to B. fragilis or not, that are 
essential for growth and/or viability of B. fragilis under at least one growth condition. 
Polypeptides essential for growth and/or viability can be determined by examining the 
effect of deleting and/or disrupting the genes, i.e., by so-called gene "knockout". 

20 Alternatively, genetic footprinting can be used (Smith et al 9 1995, Proc. Natl Acad Sci. 
USA 92:5479-6433; Published International Application WO 94/26933; U.S. Patent No. 
5,612,1 80). Still other methods for assessing essentiality includes the ability to isolate 
conditional lethal mutations in the specific gene (e.g., temperature sensitive mutations). 
Other useful targets for therapeutic drugs, which include polypeptides that are not 

25 essential for growth or viability per se but lead to loss of viability of the cell, can be used 
to target therapeutic agents to cells. 
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STRAIN-SPECIFIC SEQUENCES: 

Because of the evolutionary relationship between different B, fragilis strains, it is 
believed that the presently disclosed B. fragilis sequences are useful for identifying, 
and/or discriminating between, previously known and new B. fragilis strains. It is 
5 believed that other B. fragilis strains will exhibit at least about 70% sequence homology 
with the presently disclosed sequence. Systematic and routine analyses of DNA 
sequences derived from samples containing B. fragilis strains, and comparison with the 
present sequence allows for the identification of sequences that can be used to 
discriminate between strains, as well as those that are common to all B. fragilis strains. 
10 In one embodiment, the invention provides nucleic acids, including probes, and peptide 
f i and polypeptide sequences that discriminate between different strains of B. fragilis , 

i'r f Strain-specific components can also be identified functionally by their ability to elicit or 

react with antibodies that selectively recognize one or more B, fragilis strains. 

In another embodiment, the invention provides nucleic acids, including probes, 
^ 15 and peptide and polypeptide sequences that are common to all B. fragilis strains but are 

IT 

not found in other bacterial species. 

o . 

(3 B. FRAGILIS POLYPEPTIDES 

This invention encompasses isolated B. fragilis polypeptides encoded by the 
20 disclosed B. fragilis genomic sequences, including the polypeptides of the invention 
contained in the Sequence Listing. Polypeptides of the invention are preferably at least 
about 5 amino acid residues in length. Using the DNA sequence information provided 
herein, the amino acid sequences of the polypeptides encompassed by the invention can 
be deduced using methods well-known in the art. It will be understood that the sequence 
25 of an entire nucleic acid encoding an B. fragilis polypeptide can be isolated and 

identified based on an ORF that encodes only a fragment of the cognate protein-coding 
region. This can be achieved, for example, by using the isolated nucleic acid encoding 
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the ORF ? or fragments thereof, to prime a polymerase chain reaction with genomic B, 
fragilis DNA as template; this is followed by sequencing the amplified product. 

The polypeptides of the present invention, including function-conservative 
variants of the disclosed ORFs, may be isolated from wild-type or mutant B. fragilis 
5 cells, or from heterologous organisms or cells (including, but not limited to, bacteria, 
fungi, insect, plant, and mammalian cells) including B. fragilis into which an B. fragilis - 
derived protein-coding sequence has been introduced and expressed. Furthermore, the 
polypeptides may be part of recombinant fusion proteins. 

B, fragilis polypeptides of the invention can be chemically synthesized using 
10 commercially automated procedures such as those referenced herein , including, without 
! :; limitation, exclusive solid phase synthesis, partial solid phase methods, fragment 

s'^' condensation or classical solution synthesis. The polypeptides are preferably prepared by 

J". solid phase peptide synthesis as described by Merrifield, 1963, J. Am, Chem. Soc. 

i" 85:2 1 49. The synthesis is carried out with amino acids that are protected at the alpha- 

u 15 amino terminus. Trifunctional amino acids with labile side-chains are also protected 

O with suitable groups to prevent undesired chemical reactions from occurring during the 

u assembly of the polypeptides. The alpha-ammo protecting group is selectively removed 

us 

O to allow subsequent reaction to take place at the ammo-terminus. The conditions for the 

removal of the alpha-amino protecting group do not remove the side-chain protecting 
20 groups. 

Methods for polypeptide purification are well-known in the art, including, 
without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, 
reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, and 
countercurrent distribution. For some purposes, it is preferable to produce the 
25 polypeptide in a recombinant system in which the B. fragilis protein contains an 
additional sequence tag that facilitates purification, such as, but not limited to, a 
polyhistidine sequence. The polypeptide can then be purified from a crude lysate of the 
host cell by chromatography on an appropriate solid-phase matrix. Alternatively, 
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antibodies produced against an B. fragilis protein or against peptides derived therefrom 
can be used as purification reagents. Other purification methods are possible. 

The present invention also encompasses derivatives and homologues of B. 
fragilis -encoded polypeptides. For some purposes, nucleic acid sequences encoding the 
5 peptides may be altered by substitutions, additions, or deletions that provide for 

functionally equivalent molecules, i.e., function-conservative variants. For example, one 
or more amino acid residues within the sequence can be substituted by another amino 
acid of similar properties, such as, for example, positively charged amino acids (arginine, 
lysine, and histidine); negatively charged amino acids (aspartate and glutamate); polar 

10 neutral amino acids; and non-polar amino acids. 

The isolated polypeptides may be modified by, for example, phosphorylation, 
sulfation, acylation, or other protein modifications. They may also be modified with a 
label capable of providing a detectable signal, either directly or indirectly, including, but 
not limited to, radioisotopes and fluorescent compounds. 

15 To identify B, fragilis -derived polypeptides for use in the present invention, 

essentially the complete genomic sequence of a virulent, methicillin-resistant isolate of 
Bacteroides fragilis isolate was analyzed. While, in very rare instances, a nucleic acid 
sequencing error may be revealed, resolving a rare sequencing error is well within the art, 
and such an occurrence will not prevent one skilled in the art from practicing the 

20 invention. 

Also encompassed are any B. fragilis polypeptide sequences that are contained 
within the open reading frames (ORFs), including complete protein-coding sequences, of 
which any of SEQ ID NO: 1 - SEQ ID NO: 5222 forms a part. Table 2, which is 
appended herewith and which forms part of the present specification, provides a putative 
25 identification of the particular function of a polypeptide which is encoded by each ORF, 
based on the homology match (determined by the BLAST algorithm) of the predicted 
polypeptide with known proteins encoded by ORFs in other organisms. As a result, one 
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skilled in the art can use the polypeptides of the present invention for commercial and 
industrial purposes consistent with the type of putative identification of the polypeptide. 

The present invention provides a library of B. fragilis -derived polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 
5 polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences that are contemplated for use as 
components of vaccines. Non-limiting examples of such sequences are listed by SEQ ID 
NO in Table 2, which is appended herewith and which forms part of the present 
specification. 

10 The present invention also provides a library of B. fragilis -derived polypeptide 

□ sequences, and a corresponding library of nucleic acid sequences encoding the 

i'fi polypeptides, wherein the polypeptides themselves, or polypeptides contained within 

res 

ORFs of which they form a part, comprise sequences lacking homology to any known 

J ; If prokaryotic or eukaryotic sequences. Such libraries provide probes, primers, and markers 

• ^ 

u 15 which can be used to diagnose B. fragilis infection, including use as markers m 

is; 

^ epidemiological studies. Non-limiting examples of such sequences are listed by SEQ ID 

O NO in Table 2, which is appended hereto and part hereof 

The present invention also provides a library of B. fragilis -derived polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 
20 polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise targets for therapeutic drugs. 

SPECIFIC EXAMPLE: DETERMINATION OF BA CTEROIDES PROTEIN 
ANTIGENS FOR ANTIBODY AND VACCINE DEVELOPMENT 
25 The selection of Bacteroides protein antigens for vaccine development can be 

derived from the nucleic acids encoding B. fragilis polypeptides. First, the ORF's can be 
analyzed for homology to other known exported or membrane proteins and analyzed 
using the discriminant analysis described by Klein, et al. (Klein, P., Kanehsia, M., and 
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DeLisi, C. (1985) Biochimica et Biophysica Acta 815, 468-476) for predicting exported 

and membrane proteins. 

Homology searches can be performed using the BLAST algorithm contained in 

the Wisconsin Sequence Analysis Package (Genetics Computer Group, University 

5 Research Park, 575 Science Drive, Madison, WI 5371 1) to compare each predicted ORF 

amino acid sequence with all sequences found in the current GenBank, SWISS-PROT 

and PIR databases. BLAST searches for local alignments between the ORF and the 

databank sequences and reports a probability score which indicates the probability of 

finding this sequence by chance in the database. ORF's with significant homology (e.g. 

-6 

10 probabilities lower than 1x10 that the homology is only due to random chance) to 
membrane or exported proteins represent protein antigens for vaccine development. 
Possible functions can be provided to B. fragilis genes based on sequence homology to 
genes cloned in other organisms. 

Discriminant analysis (Klein, et al. supra) can be used to examine the ORF amino 

15 acid sequences. This algorithm uses the intrinsic information contained in the ORF 
amino acid sequence and compares it to information derived from the properties of 
known membrane and exported proteins. This comparison predicts which proteins will 
be exported, membrane associated or cytoplasmic. ORF amino acid sequences identified 
as exported or membrane associated by this algorithm are likely protein antigens for 

20 vaccine development. 



PRODUCTION OF FRAGMENTS AND ANALOGS OF R FRAGILIS NUCLEIC 
ACIDS AND POLYPEPTIDES 

Based on the discovery of the B. fragilis gene products of the invention provided 
25 in the Sequence Listing, one skilled in the art can alter the disclosed structure of B. 

fragilis genes, e.g., by producing fragments or analogs, and test the newly produced 
^ structures for activity. Examples of techniques known to those skilled in the relevant art 
which allow the production and testing of fragments and analogs are discussed below. 
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These, or analogous methods can be used to make and screen libraries of polypeptides, 
e.g., libraries of random peptides or libraries of fragments or analogs of cellular proteins 
for the ability to bind B. fragilis polypeptides. Such screens are useful for the 
identification of inhibitors of B. fragilis . 

5 

GENERATION OF FRAGMENTS 

Fragments of a protein can be produced in several ways, e.g., recombinantly, by 
proteolytic digestion, or by chemical synthesis. Internal or terminal fragments of a 
polypeptide can be generated by removing one or more nucleotides from one end (for a 

10 terminal fragment) or both ends (for an internal fragment) of a nucleic acid which 
encodes the polypeptide. Expression of the mutagenized DNA produces polypeptide 
fragments, Digestion with "end-nibbling" endonucleases can thus generate DNAs which 
encode an array of fragments. DNAs which encode fragments of a protein can also be 
generated by random shearing, restriction digestion or a combination of the above- 

15 discussed methods. 

Fragments can also be chemically synthesized using techniques known in the art 
such as conventional Merrifield solid phase f-Moc or t-Boc chemistry. For example, 
peptides of the present invention may be arbitrarily divided into fragments of desired 
length with no overlap of the fragments, or divided into overlapping fragments of a 

20 desired length. 

ALTERATION OF NUCLEIC ACIDS AND POLYPEPTIDES: RANDOM METHODS 

Amino acid sequence variants of a protein can be prepared by random 
mutagenesis of DNA which encodes a protein or a particular domain or region of a 
25 protein. Useful methods include PCR mutagenesis and saturation mutagenesis. A library 
of random amino acid sequence variants can also be generated by the synthesis of a set of 
degenerate oligonucleotide sequences. (Methods for screening proteins in a library of 
variants are elsewhere herein). 
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PCR MUTAGENESIS 

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introduce 

random mutations into a cloned fragment of DNA (Leung et al., 1989, Technique 1:11- 

15). The DNA region to be mutagenized is amplified using the polymerase chain 

reaction (PCR) under conditions that reduce the fidelity of DNA synthesis by Taq DNA 

2+ 

polymerase, e.g., by using a dGTP/dATP ratio of five and adding Mn to the PCR 
reaction. The pool of amplified DNA fragments are inserted into appropriate cloning 
vectors to provide random mutant libraries. 



10 



SATURATION MUTAGENESIS 

Saturation mutagenesis allows for the rapid introduction of a large number of 
single base substitutions into cloned DNA fragments (Mayers et al., 1985, Science 
229:242). This technique includes generation of mutations, e.g., by chemical treatment 
vl 15 or irradiation of single-stranded DNA in vitro, and synthesis of a complimentary DNA 

W strand. The mutation frequency can be modulated by modulating the severity of the 

I'l vr treatment, and essentially all possible base substitutions can be obtained. Because this 

procedure does not involve a genetic selection for mutant fragments both neutral 
substitutions, as well as those that alter function, are obtained. The distribution of point 
20 mutations is not biased toward conserved sequence elements. 

DEGENERATE OLIGONUCLEOTIDES 

A library of homologs can also be generated from a set of degenerate 
oligonucleotide sequences. Chemical synthesis of a degenerate sequences can be carried 
25 out in an automatic DNA synthesizer, and the synthetic genes then ligated into an 

appropriate expression vector. The synthesis of degenerate oligonucleotides is known in 
the art (see for example, Narang, SA (1983) Tetrahedron 39:3; Itakura et al, (1981) 
Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. AG Walton, 
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Amsterdam: Elsevier pp273-289; ItakuraetaL (1984) Annu. Rev, Biochem. 53:323; 
Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 1 1:477. Such 
techniques have been employed in the directed evolution of other proteins (see, for 
example, Scott et al (1990) Science 249:386-390; Roberts et al. (1992) PNAS 89:2429- 
5 2433; Devlin et al. (1990) Science 249: 404-406; Cwirla et al. (1990) PNAS 87: 6378- 
6382; as well as U.S. Patents Nos. 5,223,409, 5,198,346, and 5,096,815). 



ALTERATION OF NUCLEIC ACIDS AND POLYPEPTIDES: METHODS FOR 
DIRECTED MUTAGENESIS 

Non-random or directed, mutagenesis techniques can be used to provide specific 
sequences or mutations in specific regions. These techniques can be used to create 
variants which include, e.g., deletions, insertions, or substitutions, of residues of the 
known amino acid sequence of a protein. The sites for mutation can be modified 
individually or in series, e.g., by (1) substituting first with conserved amino acids and 
then with more radical choices depending upon results achieved, (2) deleting the target 
residue, or (3) inserting residues of the same or a different class adjacent to the located 
site, or combinations of options 1-3. 

r, w 

ALANINE SCANNING MUTAGENESIS 

20 Alanine scanning mutagenesis is a useful method for identification of certain 

residues or regions of the desired protein that are preferred locations or domains for 
mutagenesis, Cunningham and Wells (Science 244:1081-1085, 1989). In alanine 
scanning, a residue or group of target residues are identified (e.g., charged residues such 
as Arg, Asp, His, Lys, and Glu) and replaced by a neutral or negatively charged amino 

25 acid (most preferably alanine or polyalanine). Replacement of an amino acid can affect 
the interaction of the amino acids with the surrounding aqueous environment in or 
outside the cell. Those domains demonstrating functional sensitivity to the substitutions 
are then refined by introducing further or other variants at or for the sites of substitution. 
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Thus, while the site for introducing an amino acid sequence variation is predetermined, 
the nature of the mutation per se need not be predetermined. For example, to optimize 
the performance of a mutation at a given site, alanine scanning or random mutagenesis 
may be conducted at the target codon or region and the expressed desired protein subunit 
5 variants are screened for the optimal combination of desired activity. 



OLIGONUCLEOTIDE-MEDIATED MUTAGENESIS 
Oligonucleotide-mediated mutagenesis is a useful method for preparing 
substitution, deletion, and insertion variants of DNA, see, e.g., Adelman et al, {DNA 
10 2:183, 1 983). Briefly, the desired DNA is altered by hybridizing an oligonucleotide 
encoding a mutation to a DNA template, where the template is the single-stranded form 
of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 
desired protein. After hybridization, a DNA polymerase is used to synthesize an entire 
second complementary strand of the template that will thus incorporate the 
15 oligonucleotide primer, and will code for the selected alteration in the desired protein 
U DNA. Generally, oligonucleotides of at least about 25 nucleotides in length are used. 

O An optimal oligonucleotide will have 12 to 15 nucleotides that are completely 

□ complementary to the template on either side of the nucleotide(s) coding for the 

W 

mutation. This ensures that the oligonucleotide will hybridize properly to the single- 
20 stranded DNA template molecule. The oligonucleotides are readily synthesized using 
techniques known in the art such as that described by Crea et al. {Proc, Natl Acad Set. 
USA, 75: 5765[1978]). 

CASSETTE MUTAGENESIS 
25 Another method for preparing variants, cassette mutagenesis, is based on the 

technique described by Wells et al {Gene, 34:315[1985]). The starting material is a 
plasmid (or other vector) which includes the protein subunit DNA to be mutated. The 
codon(s) in the protein subunit DNA to be mutated are identified. There must be a 
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unique restriction endonuclease site on each side of the identified mutation site(s). If no 
such restriction sites exist, they may be generated using the above-described 
oligonucleotide-mediated mutagenesis method to introduce them at appropriate locations 
in the desired protein subunit DNA. After the restriction sites have been introduced into 
5 the plasmid, the plasmid is cut at these sites to linearize it. A double-stranded 
oligonucleotide encoding the sequence of the DNA between the restriction sites but 
containing the desired mutation(s) is synthesized using standard procedures. The two 
strands are synthesized separately and then hybridized together using standard 
techniques. This double-stranded oligonucleotide is referred to as the cassette. This 
10 cassette is designed to have 3' and 5' ends that are comparable with the ends of the 

linearized plasmid, such that it can be directly ligated to the plasmid. This plasmid now 
contains the mutated desired protein subunit DNA sequence. 

COMBINATORIAL MUTAGENESIS 

15 Combinatorial mutagenesis can also be used to generate mutants (Ladner et al., 

WO 88/06630). In this method, the amino acid sequences for a group of homologs or 
other related proteins are aligned, preferably to promote the highest homology possible. 
All of the amino acids which appear at a given position of the aligned sequences can be 
selected to create a degenerate set of combinatorial sequences. The variegated library of 

20 variants is generated by combinatorial mutagenesis at the nucleic acid level, and is 
encoded by a variegated gene library. For example, a mixture of synthetic 
oligonucleotides can be enzymatically ligated into gene sequences such that the 
degenerate set of potential sequences are expressible as individual peptides, or 
alternatively, as a set of larger fusion proteins containing the set of degenerate sequences. 

25 
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OTHER MODIFICATIONS OF B FRAGILIS NUCLEIC ACIDS AND 
POLYPEPTIDES 

It is possible to modify the structure of an B. fragilis polypeptide for such 
purposes as increasing solubility, enhancing stability (e.g., shelf life ex vivo and 
5 resistance to proteolytic degradation in vivo), A modified B. fragilis protein or peptide 
can be produced in which the amino acid sequence has been altered, such as by amino 
acid substitution, deletion, or addition as described herein. 

An B. fragilis peptide can also be modified by substitution of cysteine residues 
preferably with alanine, serine, threonine, leucine or glutamic acid residues to minimize 
10 dimerization via disulfide linkages. In addition, amino acid side chains of fragments of 
p the protein of the invention can be chemically modified. Another modification is 

"s s 

in cyclization of the peptide. 

^ In order to enhance stability and/or reactivity, an B. fragilis polypeptide can be 

modified to incorporate one or more polymorphisms in the amino acid sequence of the 

15 protein resulting from any natural allelic variation. Additionally, D-amino acids, non- 

iiii 

^ natural amino acids, or non-amino acid analogs can be substituted or added to produce a 

SI 

O modified protein within the scope of this invention. Furthermore, an B. fragilis 

O polypeptide can be modified using polyethylene glycol (PEG) according to the method of 

A. Sehon and co-workers (Wie et al., supra) to produce a protein conjugated with PEG. 
20 In addition, PEG can be added during chemical synthesis of the protein. Other 
modifications of B. fragilis proteins include reduction/alkylation (Tarr, Methods of 
Protein Microcharacterization, J. E. Silver ed., Humana Press, Clifton NJ 155-194 
(1986)); acylation (Tarr, supra); chemical coupling to an appropriate carrier (Mishell and 
Shiigi, eds, Selected Methods in Cellular Immunology, WH Freeman, San Francisco, CA 
25 (1980), U.S. Patent 4,939,239; or mild formalin treatment (Marsh, (1971) Int Arch of 
Allergy and Appl Immunol , 41 : 199 - 215). 

To facilitate purification and potentially increase solubility of an B, fragilis 
protein or peptide, it is possible to add an amino acid fusion moiety to the peptide 
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backbone. For example, hexa-histidine can be added to the protein for purification by 
immobilized metal ion affinity chromatography (Hochuli, E. et al. 5 (1988) 
Bio/Technology, 6: 1321 - 1325). In addition, to facilitate isolation of peptides free of 
irrelevant sequences, specific endoprotease cleavage sites can be introduced between the 
sequences of the fusion moiety and the peptide. 

To potentially aid proper antigen processing of epitopes within an B. fragilis 
polypeptide, canonical protease sensitive sites can be engineered between regions, each 
comprising at least one epitope via recombinant or synthetic methods. For example, 
charged amino acid pairs, such as KK or RR, can be introduced between regions within a 
protein or fragment during recombinant construction thereof. The resulting peptide can 
be rendered sensitive to cleavage by cathepsin and/or other trypsin-like enzymes which 
would generate portions of the protein containing one or more epitopes. In addition, such 
charged amino acid residues can result in an increase in the solubility of the peptide. 

PRIMARY METHODS FOR SCREENING POLYPEPTIDES AND ANALOGS 

Various techniques are known in the art for screening generated mutant gene 
products. Techniques for screening large gene libraries often include cloning the gene 
library into replicable expression vectors, transforming appropriate cells with the 
resulting library of vectors, and expressing the genes under conditions in which detection 
of a desired activity, e.g., in this case, binding to B. fragilis polypeptide or an interacting 
protein, facilitates relatively easy isolation of the vector encoding the gene whose product 
was detected. Each of the techniques described below is amenable to high through-put 
analysis for screening large numbers of sequences created, e.g., by random mutagenesis 
techniques. 

TWO HYBRID SYSTEMS 

Two hybrid assays such as the system described below (as with the other 
screening methods described herein), can be used to identify polypeptides, e.g., 
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fragments or analogs of a naturally-occurring B. fragilis polypeptide, e.g., of cellular 
proteins, or of randomly generated polypeptides which bind to an B. fragilis protein. 
(The B. fragilis domain is used as the bait protein and the library of variants are 
expressed as prey fusion proteins.) In an analogous fashion, a two hybrid assay (as with 
the other screening methods described herein), can be used to find polypeptides which 
bind an B. fragilis polypeptide. 

DISPLAY LIBRARIES 

In one approach to screening assays, the Bacteroides peptides are displayed on the 
surface of a cell or viral particle, and the ability of particular cells or viral particles to 
bind an appropriate receptor protein via the displayed product is detected in a "panning 
assay". For example, the gene library can be cloned into the gene for a surface 
membrane protein of a bacterial cell, and the resulting fusion protein detected by panning 
(Ladner et al., WO 88/06630; Fuchs et al. (1991) Bio/Technology 9:1370-1371; and 
Goward et al. (1992) TIBS 18:136-140). In a similar fashion, a detectably labeled ligand 
can be used to score for potentially functional peptide homologs. Fluorescently labeled 
ligands, e.g., receptors, can be used to detect homologs which retain ligand-binding 
activity. The use of fluorescently labeled ligands, allows cells to be visually inspected 
and separated under a fluorescence microscope, or, where the morphology of the cell 
permits, to be separated by a fluorescence-activated cell sorter. 

A gene library can be expressed as a fusion protein on the surface of a viral 
particle. For instance, in the filamentous phage system, foreign peptide sequences can be 
expressed on the surface of infectious phage, thereby conferring two significant benefits. 
First, since these phage can be applied to affinity matrices at concentrations well over 

1 ^ 

1 0 phage per milliliter, a large number of phage can be screened at one time. Second, 
since each infectious phage displays a gene product on its surface, if a particular phage is 
recovered from an affinity matrix in low yield, the phage can be amplified by another 
round of infection. The group of almost identical E. coli filamentous phages, Ml 3, fd., 
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and f 1 , are most often used in phage display libraries. Either of the phage gill or gVIII 
coat proteins can be used to generate fusion proteins without disrupting the ultimate 
packaging of the viral particle. Foreign epitopes can be expressed at the NH2-terminal 

end of pill and phage bearing such epitopes recovered from a large excess of phage 
5 lacking this epitope (Ladner et al. PCT publication WO 90/02909; Garrard et al, PCT 
publication WO 92/09690; Marks et al. (1992) 1 Biol Chem. 267:16007-16010; 
Griffiths et al (1993) EMBO J 12:725-734; Clackson et al. (1991) Nature 352:624-628; 
and Barbas et al. (1992) PNAS 89:4457-4461). 

A common approach uses the maltose receptor of E. coli (the outer membrane 
10 protein, LamB) as a peptide fusion partner (Charbit et al. (1986) EMBO 5, 3029-3037). 
f;i Oligonucleotides have been inserted into plasmids encoding the LamB gene to produce 

peptides fused into one of the extracellular loops of the protein. These peptides are 
available for binding to ligands, e.g., to antibodies, and can elicit an immune response 
when the cells are administered to animals. Other cell surface proteins, e.g., OmpA 
15 (Schorr et al. (1991) Vaccines 91, pp. 387-392), PhoE (Agterberg, et al (1990) Gene 88, 
37-45), and PAL (Fuchs et al. (1991) Bio/Tech 9, 1369-1372), as well as large bacterial 
surface structures have served as vehicles for peptide display. Peptides can be fused to 
p pilin, a protein which polymerizes to form the pilus-a conduit for interbacterial exchange 

of genetic information (Thiry et al. (1989) Appl Environ. Microbiol 55, 984-993). 
20 Because of its role in interacting with other cells, the pilus provides a useful support for 
the presentation of peptides to the extracellular environment. Another large surface 
structure used for peptide display is the bacterial motive organ, the flagellum, Fusion of 
peptides to the subunit protein flagellin offers a dense array of many peptide copies on 
the host cells (Kuwajima et al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins of other 
25 bacterial species have also served as peptide fusion partners. Examples include the 

Staphylococcus protein A and the outer membrane IgA protease of Neisseria (Hansson et 
al. (1992) 1 Bacteriol 174, 4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991- 
1999). 
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In the filamentous phage systems and the LamB system described above, the 
physical link between the peptide and its encoding DNA occurs by the containment of 
the DNA within a particle (cell or phage) that carries the peptide on its surface. 
Capturing the peptide captures the particle and the DNA within. An alternative scheme 

5 uses the DNA-binding protein Lad to form a link between peptide and DNA (Cull et ah 
(1 992) PNAS USA 89: 1 865-1 869). This system uses a plasmid containing the Lad gene 
with an oligonucleotide cloning site at its 3 '-end. Under the controlled induction by 
arabinose, a Lacl-peptide fusion protein is produced. This fusion retains the natural 
ability of LacI to bind to a short DNA sequence known as LacO operator (LacO), By 

10 installing two copies of LacO on the expression plasmid, the Lacl-peptide fusion binds 
tightly to the plasmid that encoded it. Because the plasmids in each cell contain only a 
single oligonucleotide sequence and each cell expresses only a single peptide sequence, 
the peptides become specifically and stablely associated with the DNA sequence that 



^ directed its synthesis. The cells of the library are gently lysed and the peptide-DNA 



15 complexes are exposed to a matrix of immobilized receptor to recover the complexes 
containing active peptides. The associated plasmid DNA is then reintroduced into cells 
for amplification and DNA sequencing to determine the identity of the peptide ligands. 
As a demonstration of the practical utility of the method, a large random library of 
dodecapeptides was made and selected on a monoclonal antibody raised against the 

20 opioid peptide dynorphin B. A cohort of peptides was recovered, all related by a 

consensus sequence corresponding to a six-residue portion of dynorphin B. (Cull et ah 
(1992) Proc. Natl Acad, Sci U.S. A. 89-1869) 

This scheme, sometimes referred to as peptides-on-plasmids, differs in two 
important ways from the phage display methods. First, the peptides are attached to the 

25 C-terminus of the fusion protein, resulting in the display of the library members as 

peptides having free carboxy termini. Both of the filamentous phage coat proteins, pill 
and pVIII, are anchored to the phage through their C-termini, and the guest peptides are 
placed into the outward-extending N-terminal domains. In some designs, the phage- 



-59- 



2709.1001-001 



displayed peptides are presented right at the amino terminus of the fusion protein. 
(Cwirla, et al. (1990) Proc. Natl. Acad Set U.S. A, 87, 6378-6382) A second difference 
is the set of biological biases affecting the population of peptides actually present in the 
libraries. The Lad fusion molecules are confined to the cytoplasm of the host cells. The 

5 phage coat fusions are exposed briefly to the cytoplasm during translation but are rapidly 
secreted through the inner membrane into the periplasmic compartment, remaining 
anchored in the membrane by their C-terminal hydrophobic domains, with the N-termini, 
containing the peptides, protruding into the periplasm while awaiting assembly into 
phage particles. The peptides in the Lad and phage libraries may differ significantly as a 

10 result of their exposure to different proteolytic activities. The phage coat proteins require 
transport across the inner membrane and signal peptidase processing as a prelude to 
incorporation into phage. Certain peptides exert a deleterious effect on these processes 
and are underrepresented in the libraries (Gallop et aL (1994) J. Med. Chem. 37(9): 1233- 
1251). These particular biases are not a factor in the LacI display system. 



N 15 The number of small peptides available in recombinant random libraries is 

^ 7 9 

P enormous. Libraries of 10 -10 independent clones are routinely prepared. Libraries as 

O large as 10 recombinants have been created, but this size approaches the practical limit 

S3 (!;;:; 

13 for clone libraries. This limitation in library size occurs at the step of transforming the 

DNA containing randomized segments into the host bacterial cells. To circumvent this 

20 limitation, an in vitro system based on the display of nascent peptides in polysome 

complexes has recently been developed. This display library method has the potential of 

producing libraries 3-6 orders of magnitude larger than the currently available 

phage/phagemid or plasmid libraries. Furthermore, the construction of the libraries, 

expression of the peptides, and screening, is done in an entirely cell-free format. 

25 In one application of this method (Gallop et al. (1994) J. Med Chem. 37(9):1233- 

12 

1251), a molecular DNA library encoding 10 decapeptides was constructed and the 
library expressed in an E. coli S30 in vitro coupled transcription/translation system. 
Conditions were chosen to stall the ribosomes on the mRNA, causing the accumulation 
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of a substantial proportion of the RNA in polysomes and yielding complexes containing 
nascent peptides still linked to their encoding RNA. The polysomes are sufficiently 
robust to be affinity purified on immobilized receptors in much the same way as the more 
conventional recombinant peptide display libraries are screened. RNA from the bound 
5 complexes is recovered, converted to cDNA, and amplified by PCR to produce a 
template for the next round of synthesis and screening. The polysome display method 
can be coupled to the phage display system. Following several rounds of screening, 
cDNA from the enriched pool of polysomes was cloned into a phagemid vector. This 
vector serves as both a peptide expression vector, displaying peptides fused to the coat 

10 proteins, and as a DNA sequencing vector for peptide identification. By expressing the 
polysome-derived peptides on phage, one can either continue the affinity selection 
procedure in this format or assay the peptides on individual clones for binding activity in 
a phage ELISA, or for binding specificity in a completion phage ELISA (Barret, et al. 
(1992) Anal Biochem 204,3 57-364). To identify the sequences of the active peptides 

15 one sequences the DNA produced by the phagemid host. 

SECONDARY SCREENING OF POLYPEPTIDES AND ANALOGS 

The high through-put assays described above can be followed by secondary 
screens in order to identify further biological activities which will, e.g., allow one skilled 

20 in the art to differentiate agonists from antagonists. The type of a secondary screen used 
will depend on the desired activity that needs to be tested. For example, an assay can be 
developed in which the ability to inhibit an interaction between a protein of interest and 
its respective ligand can be used to identify antagonists from a group of peptide 
fragments isolated though one of the primary screens described above, 

25 Therefore, methods for generating fragments and analogs and testing them for 

activity are known in the art. Once the core sequence of interest is identified, it is routine 
for one skilled in the art to obtain analogs and fragments. 
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PEPTIDE MIMETICS OF B. FRAGILIS POLYPEPTIDES 

The invention also provides for reduction of the protein binding domains of the 
subject B.fragilis polypeptides to generate mimetics, e.g. peptide or non-peptide agents. 
The peptide mimetics are able to disrupt binding of a polypeptide to its counter ligand, 
5 e.g., in the case of an B. fragilis polypeptide binding to a naturally occurring ligand. The 
critical residues of a subject B, fragilis polypeptide which are involved in molecular 
recognition of a polypeptide can be determined and used to generate B.fragilis -derived 
peptidomimetics which competitively or noncompetitively inhibit binding of the B. 
fragilis polypeptide with an interacting polypeptide (see, for example, European patent 
10 applications EP-4 1 2 5 762A and EP-B3 1 ,080A). 

For example, scanning mutagenesis can be used to map the amino acid residues 
of a particular B, fragilis polypeptide involved in binding an interacting polypeptide, 
peptidomimetic compounds (e.g. diazepine or isoquinoline derivatives) can be generated 
which mimic those residues in binding to an interacting polypeptide, and which therefore 
15 can inhibit binding of an B. fragilis polypeptide to an interacting polypeptide and thereby 
interfere with the function of B. fragilis polypeptide. For instance, non-hydrolyzable 
peptide analogs of such residues can be generated using benzodiazepine (e.g., see 
O Freidinger et al. in Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM 

Publisher: Leiden, Netherlands, 1988), azepine (e.g., see Huffman et al. in Peptides: 
20 Chemistry and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 
1988), substituted gama lactam rings (Garvey et al. in Peptides: Chemistry and Biology, 
G.R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1 988), keto-methylene 
pseudopeptides (Ewenson et al. (1986) J Med Chem 29:295; and Ewenson et al in 
Peptides: Structure and Function (Proceedings of the 9th American Peptide Symposium) 
25 Pierce Chemical Co. Rockland, IL, 1985), b-turn dipeptide cores (Nagai et al. (1985) 

Tetrahedron Lett 26:647; and Sato etal. (1986) J Chem Soc Perkin Trans 1:1231), and b- 
aminoalcohols (Gordon et al. (1985) Biochem Biophys Res Commun 126:419; and et al. 
(1986) Biochem Biophys Res Commun 134:71). 
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VACCINE FORMULATIONS FOR B. FRAGILIS NUCLEIC ACIDS AND 
POLYPEPTIDES 

This invention also features vaccine compositions for protection against infection 
5 by B, fragilis or for treatment of B, fragilis infection. In one embodiment, the vaccine 
compositions contain one or more immunogenic components such as a surface protein 
from B. fragilis , or portion thereof, and a pharmaceutical^ acceptable carrier. Nucleic 
acids within the scope of the invention are exemplified by the nucleic acids of the 
invention contained in the Sequence Listing which encode B. fragilis surface proteins. 
10 Any nucleic acid encoding an immunogenic B. fragilis protein, or portion thereof, which 
is capable of expression in a cell, can be used in the present invention. These vaccines 
have therapeutic and prophylactic utilities. 

One aspect of the invention provides a vaccine composition for protection against 
infection by B, fragilis which contains at least one immunogenic fragment of an B. 
Si 15 fragilis protein and a pharmaceutical^ acceptable carrier. Preferred fragments include 

S3 peptides of at least about 1 0 amino acid residues in length, preferably about 1 0-20 amino 

13 acid residues in length, and more preferably about 12-16 amino acid residues in length. 

IB} KSB 

13 Immunogenic components of the invention can be obtained, for example, by 

Q 

screening polypeptides recombinantly produced from the corresponding fragment of the 
20 nucleic acid encoding the full-length B. fragilis protein. In addition, fragments can be 

chemically synthesized using techniques known in the art such as conventional 

Merrifield solid phase f-Moc or t-Boc chemistry. 

In one embodiment, immunogenic components are identified by the ability of the 

peptide to stimulate T cells. Peptides which stimulate T cells, as determined by, for 
25 example, T cell proliferation or cytokine secretion are defined herein as comprising at 

least one T cell epitope, T cell epitopes are believed to be involved in initiation and 

perpetuation of the immune response to the protein allergen which is responsible for the 

clinical symptoms of allergy. These T cell epitopes are thought to trigger early events at 
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the level of the T helper cell by binding to an appropriate HLA molecule on the surface 
of an antigen presenting cell, thereby stimulating the T cell subpopulation with the 
relevant T cell receptor for the epitope. These events lead to T cell proliferation, 
lymphokine secretion, local inflammatory reactions, recruitment of additional immune 
cells to the site of antigen/T cell interaction, and activation of the B cell cascade, leading 
to the production of antibodies. A T cell epitope is the basic element, or smallest unit of 
recognition by a T cell receptor, where the epitope comprises amino acids essential to 
receptor recognition (e.g., approximately 6 or 7 amino acid residues). Amino acid 
sequences which mimic those of the T cell epitopes are within the scope of this 
invention. 

Screening immunogenic components can be accomplished using one or more of 
several different assays. For example, in vitro, peptide T cell stimulatory activity is 
assayed by contacting a peptide known or suspected of being immunogenic with an 
antigen presenting cell which presents appropriate MHC molecules in a T cell culture. 
Presentation of an immunogenic B. fragilis peptide in association with appropriate MHC 
molecules to T cells in conjunction with the necessary co-stimulation has the effect of 
transmitting a signal to the T cell that induces the production of increased levels of 
cytokines, particularly of interleukin-2 and interleukin-4. The culture supernatant can be 
obtained and assayed for interleukin-2 or other known cytokines. For example, any one 
of several conventional assays for interleukin-2 can be employed, such as the assay 
described in Proc. Natl. Acad. Sci USA, 86: 1333 (1989) the pertinent portions of which 
are incorporated herein by reference. A kit for an assay for the production of interferon is 
also available from Genzyme Corporation (Cambridge, MA). 

Alternatively, a common assay for T cell proliferation entails measuring tritiated 
thymidine incorporation. The proliferation of T cells can be measured in vitro by 
determining the amount of Vlabeled thymidine incorporated into the replicating DNA 
of cultured cells. Therefore, the rate of DNA synthesis and, in turn, the rate of cell 
division can be quantified. 
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Vaccine compositions of the invention containing immunogenic components 
(e.g., B. fragilis polypeptide or fragment thereof or nucleic acid encoding an B. fragilis 
polypeptide or fragment thereof) preferably include a pharmaceutically acceptable 
carrier. The term "pharmaceutically acceptable carrier" refers to a carrier that does not 
5 cause an allergic reaction or other untoward effect in patients to whom it is administered. 
Suitable pharmaceutically acceptable carriers include, for example, one or more of water, 
saline, phosphate buffered saline, dextrose, glycerol, ethanol and the like, as well as 
combinations thereof. Pharmaceutically acceptable carriers may further comprise minor 
amounts of auxiliary substances such as wetting or emulsifying agents, preservatives or 
10 buffers, which enhance the shelf life or effectiveness of the antibody. For vaccines of the 
r v invention containing B. fragilis polypeptides, the polypeptide is co-administered with a 

suitable adjuvant, 

™ It will be apparent to those of skill in the art that the therapeutically effective 

if "i 

•si ~s' 

' \ amount of DNA or protein of this invention will depend, inter alia, upon the 

!«> 

O 15 administration schedule, the unit dose of antibody administered, whether the protein or 

s 

*?* DNA is administered in combination with other therapeutic agents, the immune status 

p and health of the patient, and the therapeutic activity of the particular protein or DNA. 

lis! 
SB '<V- 

{:] Vaccine compositions are conventionally administered parenterally, e.g., by 

5 

injection, either subcutaneously or intramuscularly. Methods for intramuscular 
20 immunization are described by Wolff et al. (1990) Science 247: 1465-1468 and by 

Sedegah et al. (1 994) Immunology 91: 9866-9870. Other modes of administration 

include oral and pulmonary formulations, suppositories, and transdermal applications. 

Oral immunization is preferred over parenteral methods for inducing protection against 

infection by B. fragilis . Cain et. al. (1993) Vaccine 1 1 : 637-642. Oral formulations 
25 include such normally employed excipients as, for example, pharmaceutical grades of 

mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium 

carbonate, and the like. 
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The vaccine compositions of the invention can include an adjuvant, including, but 
not limited to aluminum hydroxide; N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr- 
MDP); N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor- 
MDP); N-acetylmuramyl-L-alanyl«D-isoglutaminyl-L-alanine-2-(r-2'-dipalmitoyl-sn-- 
5 glycero-3-hydroxyphos-phoryloxy)-ethylamine (CGP 1983 5 A, referred to a MTP-PE); 
RIBI, which contains three components from bacteria; monophosphoryl lipid A; 
trehalose dimycoloate; cell wall skeleton (MPL + TDM + CWS) in a 2% squalene/Tween 
80 emulsion; and cholera toxin. Others which may be used are non-toxic derivatives of 
cholera toxin, including its B subunit, and/or conjugates or genetically engineered fusions 

10 of the B. fragilis polypeptide with cholera toxin or its B subunit, procholeragenoid, 

fungal polysaccharides, including schizophyllan, muramyl dipeptide, muramyl dipeptide 
derivatives, phorbol esters, labile toxin of E. coli, non-5, fragilis bacterial lysates, block 
polymers or saponins. 

Other suitable delivery methods include biodegradable microcapsules or immuno- 

15 stimulating complexes (ISCOMs), cochleates, or liposomes, genetically engineered 

attenuated live vectors such as viruses or bacteria, and recombinant (chimeric) virus-like 
particles, e.g., bluetongue. The amount of adjuvant employed will depend on the type of 
adjuvant used. For example, when the mucosal adjuvant is cholera toxin, it is suitably 
used in an amount of 5 mg to 50 mg, for example 10 mg to 35 mg. When used in the 

20 form of microcapsules, the amount used will depend on the amount employed in the 
matrix of the microcapsule to achieve the desired dosage. The determination of this 
amount is within the skill of a person of ordinary skill in the art. 

Carrier systems in humans may include enteric release capsules protecting the 
antigen from the acidic environment of the stomach, and including B. fragilis polypeptide 

25 in an insoluble form as fusion proteins. Suitable carriers for the vaccines of the invention 
are enteric coated capsules and polylactide-glycolide microspheres. Suitable diluents are 
0.2 N NaHC0 3 and/or saline. 
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Vaccines of the invention can be administered as a primary prophylactic agent in 
adults or in children, as a secondary prevention, after successful eradication of B. fragilis 
in an infected host, or as a therapeutic agent in the aim to induce an immune response in 
a susceptible host to prevent infection by B. fragilis . The vaccines of the invention are 
5 administered in amounts readily determined by persons of ordinary skill in the art. Thus, 
for adults a suitable dosage will be in the range of 10 mg to 10 g, preferably 10 mg to 100 
mg. A suitable dosage for adults will also be in the range of 5 mg to 500 mg. Similar 
dosage ranges will be applicable for children. Those skilled in the art will recognize that 
the optimal dose may be more or less depending upon the patient's body weight, disease, 

10 the route of administration, and other factors. Those skilled in the art will also recognize 
that appropriate dosage levels can be obtained based on results with known oral vaccines 
such as, for example, a vaccine based on an E. coli lysate (6 mg dose daily up to total of 
540 mg) and with an enterotoxigenic E. coli purified antigen (4 doses of 1 mg) 
(Schulman et al, J. Urol. 150:917-921 (1993); Boedecker et al., American 

15 Gastroenterological Assoc. 999:A-222 (1993)). The number of doses will depend upon 
the disease, the formulation, and efficacy data from clinical trials. Without intending any 
limitation as to the course of treatment, the treatment can be administered over 3 to 8 
doses for a primary immunization schedule over 1 month (Boedeker, American 
Gastroenterological Assoc, 888 : A-222 (1 993)). 

20 In a preferred embodiment, a vaccine composition of the invention can be based 

on a killed whole E. coli preparation with an immunogenic fragment of an B. fragilis 
protein of the invention expressed on its surface or it can be based on an E. coli lysate, 
wherein the killed E. coli acts as a carrier or an adjuvant. 

It will be apparent to those skilled in the art that some of the vaccine 

25 compositions of the invention are useful only for preventing B. fragilis infection, some 
are useful only for treating B. fragilis infection, and some are useful for both preventing 
and treating B. fragilis infection. In a preferred embodiment, the vaccine composition of 
the invention provides protection against B. fragilis infection by stimulating humoral 
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and/or cell-mediated immunity against B. fragilis . It should be understood that 
amelioration of any of the symptoms of B. fragilis infection is a desirable clinical goal, 
including a lessening of the dosage of medication used to treat B. fragilis -caused disease, 
or an increase in the production of antibodies in the serum or mucous of patients. 

5 

ANTIBODIES REACTIVE WITH B, FRAGILIS POLYPEPTIDES 

The invention also includes antibodies specifically reactive with the subject B. 
fragilis polypeptide. Anti-protein/anti-peptide antisera or monoclonal antibodies can be 
made by standard protocols (See, for example, Antibodies: A Laboratory Manual ed. by 

10 Harlow and Lane (Cold Spring Harbor Press: 1988)), A mammal such as a mouse, a 
hamster or rabbit can be immunized with an immunogenic form of the peptide. 
Techniques for conferring immunogenicity on a protein or peptide include conjugation to 
carriers or other techniques well known in the art. An immunogenic portion of the 
subject B. fragilis polypeptide can be administered in the presence of adjuvant. The 

15 progress of immunization can be monitored by detection of antibody titers in plasma or 
serum. Standard ELIS A or other immunoassays can be used with the immunogen as 
antigen to assess the levels of antibodies. 

In a preferred embodiment, the subject antibodies are immunospecific for 
antigenic determinants of the B. fragilis polypeptides of the invention, e.g. antigenic 

20 determinants of a polypeptide of the invention contained in the Sequence Listing, or a 
closely related human or non-human mammalian homolog (e.g., 90% homologous, more 
preferably at least about 95% homologous). In yet a further preferred embodiment of the 
invention, the anti-5. fragilis antibodies do not substantially cross react (i.e., react 
specifically) with a protein which is for example, less than 80% percent homologous to a 

25 sequence of the invention contained in the Sequence Listing. By "not substantially cross 
react", it is meant that the antibody has a binding affinity for a non-homologous protein 
which is less than 1 0 percent, more preferably less than 5 percent, and even more 
preferably less than 1 percent, of the binding affinity for a protein of the invention 
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contained in the Sequence Listing. In a most preferred embodiment, there is no cross- 
reactivity between bacterial and mammalian antigens. 

The term antibody as used herein is intended to include fragments thereof which 
are also specifically reactive with B. fragilis polypeptides. Antibodies can be fragmented 
5 using conventional techniques and the fragments screened for utility in the same manner 
as described above for whole antibodies. For example, F(ab')2 fragments can be 
generated by treating antibody with pepsin. The resulting F(ab')2 fragment can be treated 
to reduce disulfide bridges to produce Fab' fragments. The antibody of the invention is 
further intended to include bispecific and chimeric molecules having an anti-5. fragilis 
10 portion. 

Both monoclonal and polyclonal antibodies (Ab) directed against B. fragilis 
polypeptides or B. fragilis polypeptide variants, and antibody fragments such as Fab" and 
F(ab')2 ? can be used to block the action of B. fragilis polypeptide and allow the study of 

the role of a particular B, fragilis polypeptide of the invention in aberrant or unwanted 
15 intracellular signaling, as well as the normal cellular function of the B. fragilis and by 
microinjection of anti-5. fragilis polypeptide antibodies of the present invention. 

Antibodies which specifically bind B. fragilis epitopes can also be used in 
immunohistochemical staining of tissue samples in order to evaluate the abundance and 
pattern of expression of B, fragilis antigens. Anti-5. fragilis polypeptide antibodies can 
20 be used diagnostically in immuno-precipitation and immuno-blotting to detect and 

evaluate B. fragilis levels in tissue or bodily fluid as part of a clinical testing procedure. 
Likewise, the ability to monitor B. fragilis polypeptide levels in an individual can allow 
determination of the efficacy of a given treatment regimen for an individual afflicted with 
such a disorder. The level of an B. fragilis polypeptide can be measured in cells found in 
25 bodily fluid, such as in urine samples or can be measured in tissue, such as produced by 
gastric biopsy. Diagnostic assays using anti-5. fragilis antibodies can include, for 
example, immunoassays designed to aid in early diagnosis of B. fragilis infections. The 
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present invention can also be used as a method of detecting antibodies contained in 
samples from individuals infected by this bacterium using specific B. fragilis antigens. 

Another application of anti-5. fragilis polypeptide antibodies of the invention is 
in the immunological screening of cDNA libraries constructed in expression vectors such 

5 as Xgtl 1, A,gtl8-23, AZAP, and ?tORF8. Messenger libraries of this type, having coding 
sequences inserted in the correct reading frame and orientation, can produce fusion 
proteins. For instance, Xgtl 1 will produce fusion proteins whose amino termini consist 
of B-galactosidase amino acid sequences and whose carboxy termini consist of a foreign 
polypeptide. Antigenic epitopes of a subject B. fragilis polypeptide can then be detected 

10 with antibodies, as, for example, reacting nitrocellulose filters lifted from infected plates 
with anti-5. fragilis polypeptide antibodies. Phage, scored by this assay, can then be 
isolated from the infected plate. Thus, the presence of B. fragilis gene homologs can be 
detected and cloned from other species, and alternate isoforms (including splicing 
variants) can be detected and cloned. 

15 

KITS CONTAINING NUCLEIC ACIDS, POLYPEPTIDES OR ANTIBODIES OF THE 
INVENTION 

The nucleic acid, polypeptides and antibodies of the invention can be combined 
with other reagents and articles to form kits. Kits for diagnostic purposes typically 

20 comprise the nucleic acid, polypeptides or antibodies in vials or other suitable vessels. 
Kits typically comprise other reagents for performing hybridization reactions, polymerase 
chain reactions (PCR), or for reconstitution of lyophilized components, such as aqueous 
media, salts, buffers, and the like. Kits may also comprise reagents for sample 
processing such as detergents, chaotropic salts and the like. Kits may also comprise 

25 immobilization means such as particles, supports, wells, dipsticks and the like. Kits may 
also comprise labeling means such as dyes, developing reagents, radioisotopes, 
fluorescent agents, luminescent or chemiluminescent agents, enzymes, intercalating 
agents and the like. With the nucleic acid and amino acid sequence information provided 
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herein, individuals skilled in art can readily assemble kits to serve their particular 
purpose. Kits further can include instructions for use. 

BIO CHIP TECHNOLOGY 
5 The nucleic acid sequence of the present invention may be used to detect B. 

fragilis or other species of Bacteroides acid sequence using bio chip technology. Bio 
chips containing arrays of nucleic acid sequence can also be used to measure expression 
of genes of B. fragilis or other species of Bacteroides. For example, to diagnose a patient 
with a B. fragilis or other Bacteroides infection, a sample from a human or animal can be 
10 used as a probe on a bio chip containing an array of nucleic acid sequence from the 
present invention. In addition, a sample from a disease state can be compared to a 
sample from a non-disease state which would help identify a gene that is up-regulated or 
expressed in the disease state. This would provide valuable insight as to the mechanism 
y by which the disease manifests. Changes in gene expression can also be used to identify 

15 critical pathways involved in drug transport or metabolism, and may enable the 

identification of novel targets involved in virulence or host cell interactions involved in 
3 maintenance of an infection. Procedures using such techniques have been described by 

Brown et al. 9 1995, Science 270: 467-470. 

Bio chips can also be used to monitor the genetic changes of potential therapeutic 
20 compounds including, deletions, insertions or mismatches. Once the therapeutic is added 
to the patient, changes to the genetic sequence can be evaluated for its efficacy. In 
addition, the nucleic acid sequence of the present invention can be used to determine 
essential genes in cell cycling. As described in Iyer et ah, 1999 (Science, 283:83-87 ) 
genes essential in the cell cycle can be identified using bio chips. Furthermore, the 
25 present invention provides nucleic acid sequence which can be used with bio chip 
technology to understand regulatory networks in bacteria, measure the response to 
environmental signals or drugs as in drug screening, and study virulence induction. 
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(Mons et aL, 1998, Nature Biotechnology, 16: 45-48. Patents teaching this technology 
include U.S. Patents 5445934, 5744305, and 5800992. 
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DRUG SCREENING ASSAYS USING B. FRAGILIS POLYPEPTIDES 

5 By making available purified and recombinant B. fragilis polypeptides, the 

present invention provides assays which can be used to screen for drugs which are either 
agonists or antagonists of the normal cellular function, in this case, of the subject B, 
fragilis polypeptides, or of their role in intracellular signaling. Such inhibitors or 
potentiators may be useful as new therapeutic agents to combat B. fragilis infections in 

10 humans. A variety of assay formats will suffice and, in light of the present inventions, 
will be comprehended by the person skilled in the art. 

In many drug screening programs which test libraries of compounds and natural 
extracts, high throughput assays are desirable in order to maximize the number of 
compounds surveyed in a given period of time. Assays which are performed in cell-free 

15 systems, such as may be derived with purified or semi-purified proteins, are often 

preferred as "primary" screens in that they can be generated to permit rapid development 
and relatively easy detection of an alteration in a molecular target which is mediated by a 
test compound. Moreover, the effects of cellular toxicity and/or bioavailability of the test 
compound can be generally ignored in the in vitro system, the assay instead being 

20 focused primarily on the effect of the drug on the molecular target as may be manifest in 
an alteration of binding affinity with other proteins or change in enzymatic properties of 
the molecular target. Accordingly, in an exemplary screening assay of the present 
invention, the compound of interest is contacted with an isolated and purified B. fragilis 
polypeptide. 

25 Screening assays can be constructed in vitro with a purified B. fragilis 

polypeptide or fragment thereof, such as an B, fragilis polypeptide having enzymatic 
activity, such that the activity of the polypeptide produces a detectable reaction product. 
The efficacy of the compound can be assessed by generating dose response curves from 
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data obtained using various concentrations of the test compound. Moreover, a control 
assay can also be performed to provide a baseline for comparison. Suitable products 
include those with distinctive absorption, fluorescence, or chemi-luminescence 
properties, for example, because detection may be easily automated. A variety of 
synthetic or naturally occurring compounds can be tested in the assay to identify those 
which inhibit or potentiate the activity of the B. fragilis polypeptide. Some of these 
active compounds may directly, or with chemical alterations to promote membrane 
permeability or solubility, also inhibit or potentiate the same activity (e.g., enzymatic 
activity) in whole, live B. fragilis cells. 



10 



OVEREXPRESSION ASSAYS 

Overexpression assays are based on the premise that overproduction of a protein 
would lead to a higher level of resistance to compounds that selectively interfere with the 
function of that protein. Overexpression assays may be used to identify compounds that 
W 15 interfere with the function of virtually any type of protein, including without limitation 

enzymes, receptors, DNA- or RNA-binding proteins, or any proteins that are directly or 
indirectly involved in regulating cell growth. 

Typically, two bacterial strains are constructed. One contains a single copy of the 
gene of interest, and a second contains several copies of the same gene. Identification of 
20 useful inhibitory compounds of this type of assay is based on a comparison of the activity 
of a test compound in inhibiting growth and/or viability of the two strains. The method 
involves constructing a nucleic acid vector that directs high level expression of a 
particular target nucleic acid. The vectors are then transformed into host cells in single 
or multiple copies to produce strains that express low to moderate and high levels of 
25 protein encoding by the target sequence (strain A and B, respectively). Nucleic acid 
comprising sequences encoding the target gene can, of course, be directly integrated into 
the host cell. 
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Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on the growth of the two strains. Agents which 
interfere with an unrelated target equally inhibit the growth of both strains. Agents 
which interfere with the function of the target at high concentration should inhibit the 
5 growth of both strains. It should be possible, however, to titrate out the inhibitory effect 
of the compound in the overexpressing strain. That is, if the compound is affecting the 
particular target that is being tested, it should be possible to inhibit the growth of strain A 
at a concentration of the compound that allows strain B to grow. 

Alternatively, a bacterial strain is constructed that contains the gene of interest 

10 under the control of an inducible promoter. Identification of useful inhibitory agents 
using this type of assay is based on a comparison of the activity of a test compound in 
inhibiting growth and/or viability of this strain under both inducing and non-inducing 
conditions. The method involves constructing a nucleic acid vector that directs high- 
level expression of a particular target nucleic acid. The vector is then transformed into 

15 host cells that are grown under both non- inducing and inducing conditions (conditions A 
and B, respectively). 

Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on growth under these two conditions. Agents 
that interfere with the function of the target should inhibit growth under both conditions. 

20 It should be possible, however, to titrate out the inhibitory effect of the compound in the 
overexpressing strain. That is, if the compound is affecting the particular target that is 
being tested, it should be possible to inhibit growth under condition A at a concentration 
that allows the strain to grow under condition B. 



25 LIGAND-BINDING ASSAYS 

Many of the targets according to the invention have functions that have not yet 
been identified. Ligand-binding assays are useful to identify inhibitor compounds that 
interfere with the function of a particular target, even when that function is unknown. 
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These assays are designed to detect binding of test compounds to particular targets. The 
detection may involve direct measurement of binding* Alternatively, indirect indications 
of binding may involve stabilization of protein structure or disruption of a biological 
function. Non-limiting examples of useful ligand-binding assays are detailed below. 
5 A useful method for the detection and isolation of binding proteins is the 

Biomolecular Interaction Assay (BIAcore) system developed by Pharmacia Biosensor 
and described in the manufacturer's protocol (LKB Pharmacia, Sweden). The BIAcore 
system uses an affinity purified anti-GST antibody to immobilize GST- fusion proteins 
onto a sensor chip. The sensor utilizes surface plasmon resonance which is an optical 
10 phenomenon that detects changes in refractive indices. In accordance with the practice of 
the invention, a protein of interest is coated onto a chip and test compounds are passed 
over the chip. Binding is detected by a change in the refractive index (surface plasmon 
resonance). 

A different type of ligand-binding assay involves scintillation proximity assays 
15 (SPA, described in U.S. Patent No. 4,568,649). 

Another type of ligand binding assay, also undergoing development, is based on 
the fact that proteins containing mitochondrial targeting signals are imported into isolated 
f mitochondria in vitro (Hurt et al, 1985, Embo J. 4:2061-2068; Eilers and Schatz, Nature, 

1986, 322:228-231). In a mitochondrial import assay, expression vectors are constructed 
20 in which nucleic acids encoding particular target proteins are inserted downstream of 
sequences encoding mitochondrial import signals. The chimeric proteins are synthesized 
and tested for their ability to be imported into isolated mitochondria in the absence and 
presence of test compounds. A test compound that binds to the target protein should 
inhibit its uptake into isolated mitochondria in vitro. 
25 Another ligand-binding assay is the yeast two-hybrid system (Fields and Song, 

1989, Nature 340:245-246). The yeast two-hybrid system takes advantage of the 
properties of the GAL4 protein of the yeast Saccharomyces cerevisiae. The GAL4 
protein is a transcriptional activator required for the expression of genes encoding 
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enzymes of galactose utilization. This protein consists of two separable and functionally 
essential domains: an N-terminal domain which binds to specific DNA sequences 
(UAS G ); and a C-terminal domain containing acidic regions, which is necessary to 
activate transcription. The native GAL4 protein, containing both domains, is a potent 
activator of transcription when yeast are grown on galactose media. The N-terminal 
domain binds to DNA in a sequence-specific manner but is unable to activate 
transcription. The C-terminal domain contains the activating regions but cannot activate 
transcription because it fails to be localized to UAS G - In the two-hybrid system, a system 
of two hybrid proteins containing parts of GAL4: (1) a GAL4 DNA-binding domain 
fused to a protein 'X' and (2) a GAL4 activation region fused to a protein *Y\ If X and Y 
can form a protein-protein complex and reconstitute proximity of the GAL4 domains, 
transcription of a gene regulated by UAS G occurs. Creation of two hybrid proteins, each 
containing one of the interacting proteins X and Y, allows the activation region of UAS G 
to be brought to its normal site of action. 

The binding assay described in Fodor et al, 1991 , Science 25 1 :767-773, which 
involves testing the binding affinity of test compounds for a plurality of defined polymers 
synthesized on a solid substrate, may also be useful. 

Compounds which bind to the polypeptides of the invention are potentially useful 
as antibacterial agents for use in therapeutic compositions. 

Pharmaceutical formulations suitable for antibacterial therapy comprise the 
antibacterial agent in conjunction with one or more biologically acceptable carriers. 
Suitable biologically acceptable carriers include, but are not limited to, phosphate- 
buffered saline, saline, deionized water, or the like. Preferred biologically acceptable 
carriers are physiologically or pharmaceutical^ acceptable carriers. 

The antibacterial compositions include an antibacterial effective amount of active 
agent. Antibacterial effective amounts are those quantities of the antibacterial agents of 
the present invention that afford prophylactic protection against bacterial infections or 
which result in amelioration or cure of an existing bacterial infection. This antibacterial 
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effective amount will depend upon the agent, the location and nature of the infection, and 
the particular host. The amount can be determined by experimentation known in the art, 
such as by establishing a matrix of dosages and frequencies and comparing a group of 
experimental units or subjects to each point in the matrix. 
5 The antibacterial active agents or compositions can be formed into dosage unit 

forms, such as for example, creams, ointments, lotions, powders, liquids, tablets, 
capsules, suppositories, sprays, aerosols or the like. If the antibacterial composition is 
formulated into a dosage unit form, the dosage unit form may contain an antibacterial 
effective amount of active agent. Alternatively, the dosage unit form may include less 
10 than such an amount if multiple dosage unit forms or multiple dosages are to be used to 
administer a total dosage of the active agent. Dosage unit forms can include, in addition, 
one or more excipient(s), diluent(s), disintegrant(s), lubricant(s), plasticizer(s), 
colorant(s), dosage vehicle(s), absorption enhancer(s), stabilizer(s), bactericide(s), or the 
like. 

15 For general information concerning formulations, see, e.g., Gilman et ah (eds.), 

1990, Goodman and Gilman's: The Pharmacological Basis of Therapeutics, 8th ed,, 
Pergamon Press; and Remington's Pharmaceutical Sciences, 17th ed., 1990, Mack 
P Publishing Co., Easton, PA; Avis et al. (eds.), 1993, Pharmaceutical Dosage Forms: 

Parenteral Medications, Dekker, New York; Lieberman et al (eds.), 1990, 
20 Pharmaceutical Dosage Forms: Disperse Systems, Dekker, New York. 

The antibacterial agents and compositions of the present invention are useful for 
preventing or treating B. fragilis infections. Infection prevention methods incorporate a 
prophylactically effective amount of an antibacterial agent or composition. A 
prophylactically effective amount is an amount effective to prevent B. fragilis infection 
25 and will depend upon the specific bacterial strain, the agent, and the host. These 

amounts can be determined experimentally by methods known in the art and as described 
above. 
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B. fragilis infection treatment methods incorporate a therapeutically effective 
amount of an antibacterial agent or composition. A therapeutically effective amount is 
an amount sufficient to ameliorate or eliminate the infection. The prophylactically and/or 
therapeutically effective amounts can be administered in one administration or over 

5 repeated administrations. Therapeutic administration can be followed by prophylactic 
administration, once the initial bacterial infection has been resolved. 

The antibacterial agents and compositions can be administered topically or 
systemically. Topical application is typically achieved by administration of creams, 
ointments, lotions, or sprays as described above. Systemic administration includes both 

10 oral and parental routes. Parental routes include, without limitation, subcutaneous, 
intramuscular, intraperitoneal, intravenous, transdermal, inhalation and intranasal 
administration. 



EXEMPLIFICATION 



15 



CLONING AND SEQUENCING B. FRAGILIS GENOMIC SEQUENCE 
£3 This invention provides nucleotide sequences of the genome of B. fragilis which 

¥ii Sis 

thus comprises a DNA sequence library of B, fragilis genomic DNA. The detailed 
description that follows provides nucleotide sequences of B. fragilis , and also describes 
20 how the sequences were obtained and how ORFs (Open Reading Frames) and protein- 
coding sequences can be identified. Also described are tnethods of using the disclosed B. 
fragilis sequences in methods including diagnostic and therapeutic applications. 
Furthermore, the library can be used as a database for identification and comparison of 
medically important sequences in this and other strains of B. fragilis as well as other 
25 species of Barter oides. 

Chromosomal DNA from strain 14062 of B. fragilis was isolated after Zymolyase 
digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, 
phenohchloroform extraction and ethanol precipitation (Soli, D.R., T. Srikantha and S.R. 
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Lockhart: Characterizing Developmentally Regulated Genes in B. fragilis . In Microbial 
Genome Methods. K,W. Adolph, editor. CRC Press. New York, p 17-37.). Genomic B. 
fragilis DNA was hydrodynamically sheared in an HPLC and then separated on a 
standard 1% agarose gel. Fractions corresponding to 2500-3000 bp in length were 
5 excised from the gel and purifed by the GeneClean procedure (Bio 101, Inc.). 

The purified DNA fragments were then blunt-ended using T4 DNA polymerase. 
The healed DNA was then ligated to unique BstXl- linker adapters (5 ? - 
GTCTTCACCACGGGG-3 ' and 5'-GTGGTGAAGAC-3' in 100-1000 fold molar 
excess). These linkers are complimentary to the iMXI-cut pGTC vector, while the 
10 overhang is not self-complimentary. Therefore, the linkers will not concatermerize nor 
will the cut- vector religate itself easily. The linker-adapted inserts were separated from 
the unincorporated linkers on a 1% agarose gel and purified using GeneClean. The 
linker-adapted inserts were then ligated to BstXl-cut vector to construct a "shotgun" 
sublclone libraries. 

15 Only major modifications to the protocols are highlighted. Briefly, the library 

was then transformed into DH5a competent cells (Gibco/BRL, DH5a transformation 
protocol). It was assessed by plating onto antibiotic plates containing ampicillin and 
IPTG/Xgal. The plates were incubated overnight at 37°C. Transformants were then used 
for plating of clones and picking for sequencing. The cultures were grown overnight at 

20 37°C. DNA was purified using a silica bead DNA preparation (Engelstein, 1996) 
method. In this manner, 25 jug of DNA was obtained per clone. 

These purified DNA samples were then sequenced using primarily ABI dye- 
terminator chemistry. All subsequent steps were based on sequencing by ABB 77 
automated DNA sequencing methods. The ABI dye terminator sequence reads were run 

25 on ABI377 machines and the data was transferred to UNIX machines following lane 
tracking of the gels. Base calls and quality scores were determined using the program 
PHRED (Ewing et al., 1998, Genome Res. 8: 175-185; Ewing and Green, 1998, Genome 
Res. 8: 685-734). Reads were assembled using PHRAP (P. Green, Abstracts of DOE 
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Human Genome Program Contractor-Grantee Workshop V, Jan. 1996, p. 157) with 
default program parameters and quality scores. The initial assembly was done at 7.8 fold 
coverage and yielded 223 contigs. 

Finishing can follow the initial assembly. Missing mates (sequences from clones 
5 that only gave reads from one end of the Bacteroides DNA inserted in the plasmid) can 
be identified and sequenced with ABI technology to allow the identification of additional 
overlapping contigs. 

End-sequencing of randomly picked genomic lambda was also performed. 
Sequencing on a both sides was done for all lambda sequences. The lambda library 

10 backbone helped to verify the integrity of the assembly and allowed closure of some of 
the physical gaps. Primers for walking off the ends of contigs would be selected using 
pick__primer (a GTC program) near the ends of the clones to facilitate gap closure. These 
walks can be sequenced using the selected clones and primers. These data are then 
reassembled with PHRAP. Additional sequencing using PCR-generated templates and 

15 screened and/or unscreened lambda templates can be done in addition. 

To identify B. fragilis polypeptides the complete genomic sequence of B. fragilis 
were analyzed essentially as follows: First, all possible stop-to- stop open reading frames 
(ORFs) greater than 180 nucleotides in all six reading frames were translated into amino 
acid sequences. Second, the identified ORFs were analyzed for homology to known 

20 (archeabacter, prokaryotic and eukaryotic) protein sequences. Third, the coding potential 
of non-homologous sequences were evaluated with the program GENEMARKTM 
(Borodovsky and Mclninch, 1993, Comp. Chem, 17:123). 

IDENTIFICATION, CLONING AND EXPRESSION OF B FRAGILIS NUCLEIC 
25 ACIDS 

Expression and purification of the B. fragilis polypeptides of the invention can be 
performed essentially as outlined below. 
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To facilitate the cloning, expression and purification of membrane and secreted 
proteins from B. fragilis > a gene expression system, such as the pET System (Novagen), 
for cloning and expression of recombinant proteins in E. coli, is selected. Also, a DNA 
sequence encoding a peptide tag, the His~Tag, is fused to the 3' end of DNA sequences 
5 of interest in order to facilitate purification of the recombinant protein products. The 3 ' 
end is selected for fusion in order to avoid alteration of any 5' terminal signal sequence. 

PCR AMPLIFICATION AND CLONING OF NUCLEIC ACIDS CONTAINING ORF'S 
ENCODING ENZYMES 

10 Nucleic acids chosen (for example, from the nucleic acids set forth in SEQ ID 

NO: 1 - SEQ ID NO: 5222 for cloning from the 14062 strain of A fragilis are prepared 
for amplification cloning by polymerase chain reaction (PCR). Synthetic oligonucleotide 
primers specific for the 5 1 and 3 / ends of open reading frames (ORFs) are designed and 
purchased from GibcoBRL Life Technologies (Gaithersburg, MD, USA). All forward 

15 primers (specific for the 5 1 end of the sequence) are designed to include an Ncol cloning 
site at the extreme 5 1 terminus. These primers are designed to permit initiation of 
protein translation at a methionine residue followed by a valine residue and the coding 
sequence for the remainder of the native B. fragilis DNA sequence. All reverse primers 
(specific for the 3 7 end of any B. fragilis ORF) include a EcoRI site at the extreme 5 7 

20 terminus to permit cloning of each B. fragilis sequence into the reading frame of the 
pET-28b. The pET-28b vector provides sequence encoding an additional 20 carboxy- 
terminal amino acids including six histidine residues (at the extreme C-terminus), which 
comprise the His-Tag. 

Genomic DNA prepared from the 14062 strain of B. fragilis is used as the source 

25 of template DNA for PCR amplification reactions (Current Protocols in Molecular 
Biology, John Wiley and Sons, Inc., F, Ausubel et al. ? eds,, 1994). To amplify a DNA 
sequence containing an B. fragilis ORF, genomic DNA (50 nanograms) is introduced 
into a reaction vial containing 2 mM MgCl2, 1 micromolar synthetic oligonucleotide 
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primers (forward and reverse primers) complementary to and flanking a defined B. 
fragilis ORF, 0.2 mM of each deoxynucleotide triphosphate; dATP, dGTP, dCTP, dTTP 
and 2.5 units of heat stable DNA polymerase (Amplitaq, Roche Molecular Systems, Inc., 
Branchburg, NJ, USA) in a final volume of 100 microliters. 
5 Upon completion of thermal cycling reactions, each sample of amplified DNA is 

washed and purified using the Qiaquick Spin PCR purification kit (Qiagen, Gaithersburg, 
MD, USA). All amplified DNA samples are subjected to digestion with the restriction 
endonucleases, e.g., Ncol and EcoRI (New England BioLabs, Beverly, MA, 
USA)(Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et 
10 al., eds., 1994). DNA samples are then subjected to electrophoresis on 1 .0 % NuSeive 
(FMC BioProducts, Rockland, ME USA) agarose gels. DNA is visualized by exposure 
V to ethidium bromide and long wave uv irradiation. DNA contained in slices isolated 

^ from the agarose gel is purified using the Bio 101 GeneClean Kit protocol (Bio 101 

Vista, CA, USA). 

Id 

(Ml. 

W 15 

O CLONING OF B. FRAGILIS NUCLEIC ACIDS INTO AN EXPRESSION VECTOR 

D The pET-28b vector is prepared for cloning by digestion with restriction 

1 2 endonucleases, e.g., Ncol and EcoRI (Current Protocols in Molecular Biology, John 

Wiley and Sons, Inc., F. Ausubel et al, eds., 1994). The pET-28a vector, which encodes 
20 a His-Tag that can be fused to the 5 7 end of an inserted gene, is prepared by digestion 
with appropriate restriction endonucleases. 

Following digestion, DNA inserts are cloned (Current Protocols in Molecular 
Biology, John Wiley and Sons, Inc., F. Ausubel et al, eds., 1994) into the previously 
digested pET-28b expression vector. Products of the ligation reaction are then used to 
25 transform the BL21 strain of E. coli (Current Protocols in Molecular Biology, John Wiley 
and Sons, Inc., F. Ausubel et ah, eds., 1994) as described below. 
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TRANSFORMATION OF COMPETENT BACTERIA WITH RECOMBINANT 
PLASMIDS 

Competent bacteria, E coli strain BL21 or E. coli strain BL21(DE3), are 
transformed with recombinant pET expression plasmids carrying the cloned B. fragilis 
5 sequences according to standard methods (Current Protocols in Molecular, John Wiley 
and Sons, Inc., F. Ausubel et al., eds., 1994). Briefly, 1 microliter of ligation reaction is 
mixed with 50 microliters of electrocompetent cells and subjected to a high voltage 
pulse, after which, samples are incubated in 0.45 milliliters SOC medium (0.5% yeast 
extract, 2.0 % tryptone, 10 mM NaCl, 2.5 mM KC1, 10 mM MgC12, 10 mM MgS04 and 
10 20, mM glucose) at 37^C with shaking for 1 hour. Samples are then spread on LB agar 
plates containing 25 microgram/ml kanamycin sulfate for growth overnight. 
Transformed colonies of BL21 are then picked and analyzed to evaluate cloned inserts as 
described below. 



15 IDENTIFICATION OF RECOMBINANT EXPRESSION VECTORS WITH B. 
h FRAGILIS NUCLEIC ACIDS 

I j Individual BL21 clones transformed with recombinant pET-28b B. fragilis ORFs 

I are analyzed by PCR amplification of the cloned inserts using the same forward and 

reverse primers, specific for each B. fragilis sequence, that were used in the original PCR 
20 amplification cloning reactions. Successful amplification verifies the integration of the 

B. fragilis sequences in the expression vector (Current Protocols in Molecular Biology, 

John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). 



ISOLATION AND PREPARATION OF NUCLEIC ACIDS FROM 
25 TRANSFORMANTS 

Individual clones of recombinant pET-28b vectors carrying properly cloned B. 
fragilis ORFs are picked and incubated in 5 mis of LB broth plus 25 microgram/ml 
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kanamycin sulfate overnight. The following day plasmid DNA is isolated and purified 
using the Qiagen plasmid purification protocol (Qiagen Inc., Chatsworth, CA, USA). 



g 



EXPRESSION OF RECOMBINANT B FRAGILIS SEQUENCES IN E COLI 
5 The pET vector can be propagated in any E. coli K-12 strain e.g. HMS174, 

HB101, JM109, DH5, etc. for the purpose of cloning or plasmid preparation. Hosts for 
expression include E. coli strains containing a chromosomal copy of the gene for T7 
RNA polymerase. These hosts are lysogens of bacteriophage DE3 5 a lambda derivative 
that carries the lad gene, the lacUV5 promoter and the gene for T7 RNA polymerase. T7 

10 RNA polymerase is induced by addition of isopropyl-B-D-thiogalactoside (IPTG), and 
the T7 RNA polymerase transcribes any target plasmid, such as pET-28b, carrying its 
gene of interest. Strains used include: BL21(DE3) (Studier, F.W., Rosenberg, A.H., 
Dunn, J. J., and Dubendorff, J.W. (1990) Meth. Enzymol. 185, 60-89). 

To express recombinant B. fragilis sequences, 50 nanograms of plasmid DNA 

15 isolated as described above is used to transform competent BL21(DE3) bacteria as 
described above (provided by Novagen as part of the pET expression system kit). The 
lacZ gene (beta-galactosidase) is expressed in the pET-System as described for the B. 
fragilis recombinant constructions. Transformed cells are cultured in SOC medium for 1 
hour, and the culture is then plated on LB plates containing 25 micrograms/ml 

20 kanamycin sulfate. The following day, bacterial colonies are pooled and grown in LB 

medium containing kanamycin sulfate (25 micrograms/ml) to an optical density at 600 

nM of 0.5 to 1.0 O.D. units, at which point, 1 millimolar IPTG was added to the culture 

for 3 hours to induce gene expression of the B. fragilis recombinant DNA constructions . 

After induction of gene expression with IPTG, bacteria are pelleted by 

o 

25 centrifugation in a Sorvall RC-3B centrifuge at 3500 x g for 15 minutes at 4 C. Pellets 

are resuspended in 50 milliliters of cold 10 mM Tris-HCl, pH 8.0, 0.1 M NaCl and 0.1 

o 

mM EDTA (STE buffer). Cells are then centrifuged at 2000 x g for 20 min at 4 C. Wet 

o 

pellets are weighed and frozen at -80 C until ready for protein purification. 
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A variety of methodologies known in the art can be utilized to purify the isolated 
proteins. (Current Protocols in Protein Science, John Wiley and Sons, Inc., J. E. Coligan 
et al., eds., 1995). For example, the frozen cells may be thawed, resupended in buffer 
and ruptured by several passages through a small volume microfluidizer (Model M-l 10S, 
5 Microfluidics International Corporation, Newton, MA). The resultant homogenate may 
be centrifuged to yield a clear supernatant (crude extract) and following filtration the 
crude extract may be fractionated over columns. Fractions may be monitored by 
absorbance at OD28O nm * m & P ea ^ fractions may analyzed by SDS-PAGE 
The concentrations of purified protein preparations may be quantified 
10 spectrophotometrically using absorbance coefficients calculated from amino acid content 
r , (Perkins, S.J. 1986 Eur. J. Biochem. 157, 169-180). Protein concentrations are also 

1 jj measured by the method of Bradford, MM. (1976) Anal Biochem. 72, 248-254, and 

Lowry, O.H., Rosebrough, N., Farr, A.L. & Randall, R.J. (1951) J. Biol. Chem. 193, 
\ pages 265-275, using bovine serum albumin as a standard. 

! - 15 SDS-polyacrylamide gels of various concentrations may be purchased from 

BioRad (Hercules, CA, USA), and stained with Coomassie blue. Molecular weight 
3 markers may include rabbit skeletal muscle myosin (200 kDa), E. coli (-galactosidase 

J (116 kDa), rabbit muscle phosphorylase B (97.4 kDa), bovine serum albumin (66.2 kDa), 

ovalbumin (45 kDa), bovine carbonic anhydrase (3 1 kDa), soybean trypsin inhibitor 
20 (21 .5 kDa), egg white lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa). 



P 



EQUIVALENTS 

Those skilled in the art will recognize, or be able to ascertain using no more than 
routine experimentation, many equivalents to the specific embodiments and methods 
25 described herein. The specific embodiments described herein are offered by way of 
example only, and the invention is to limited only by the terms of the appended claims, 
along with the full scope of equivalents to which such claims are entitled. 
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Score Probability 



Locus Name 



bir:JOi020 



|3.4e-i67 



Acc# 
JQ1020 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 




Score Probability 



BITT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



NT 



AA 



NTID 



i2&aim..±i.-i 



AAID Length Length 



Score Probability 
2..3e-4B 



Protein name 



Locus Name 



glucan ■ ' 

1, 4-beta-glucosidase, ; exo-1 , 4-beta-glucosidase 



Description 



pir : JC4825 



ACC# 



JC4825 



ORF Name 



Protein name 



NTID 



NT AA 

— — , Score Probab ility 
AAID Length Length 



2.iii$±&:l±±..± 



25 



5247 



T7IT 



Locus Name 



gluthatione peroxidase 



gp:LLAJ103 



Acc# 



AJ000109 



Description 



Lactococcus lactis carB and gpo genes . 



91 



ORF Name 



Protein name 



Description 



~ — Score Probability 

NT ID AAID Length Length 



TTT 



Locus Name 



Acc# 



[NO-HIT 



ORF Name 



Protexn name 



Description 



NT 



AA 



fI7 



NT ID AAID Length Length 

— 



Score Probability 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NT 



AA 



NT ID 



AAID Length Length 



Score Probability 



T5T 



6 .5e-^b 



Protein name 

Description 
AftYLSULPATAdE b l PkEU URaok, (A£h l ) 



Locus Name 



sp:ARaF_UUMAN 



Acc# 



P54793 



NT 



ORF Name 



NT ID 



AAID Length Length 



— score Probability 



1&(X1S1&1...cl2.~3.S 



5251 



TuT!T 



Protein name 



Description 



Locus Name 



sp:HEXA_>ukcJl 



Acc# 



P49008 



(BETA-NAMAtSliJ) 



92 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



24486016 i2 21 



ITT 



575T 



1 .4e-09 



Protein name 



response regulator 



Locus Name 
|gp:SPAJbiyy 



Acc# 



AJ006398 



Description 



Streptococcus pneumoniae rrt)i => and hkl)^ genes; two component systemu^T 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



25584S2b ti 10 



□ 



31" 



575T 



TTST~ 



TFT 



2 .Se-14 



Protein name 



Locus Name 



putative secretea protein 



gp:S(jy4l 



Acc# 



AL117387 



Description 



Streptomyces coeiicoior cosmia m. 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



amab^c^aa... | 

Protein name 



TZ5T 



535" 



2W 



5.0e-^2 



Locus Name 



phosphonate monoester nyaroiase 



[gp:BCU44a52~ 



Acc# 



U44852 



Description 



(penAj gene, 



Surkhoideria caryophylli P(i SSa2 phosphonate monoester nyaroiase 
complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
(TO 



Score Probability 



TTFT 



Protein name 



Description 



Locus Name 



Acc# 



MO-SI* 



93 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



2932812 c2 69 



1.4e-34 



Protein name 



Locus Name 



IsptARaEJWMAN 



Acc# 



P51690 



Description 
AftYLSULFAl'ASB \i ^tJURSOR, (Ad hi) 



NT 



AA 



ORF Name 



NTID 



AAID 



302020^ C2 67 



35" 



Length Length 
1272 



Score Probability 
2.3e-6S " 



Protein name 



Description 



Locus Name 



Acc# 



P49008 



(BETA-NAUAfclK) 



ORF Name 



NTID 



AAID 



— — Score Probability 
Length Length 



1464 



1 . oe-7i 



Protein name 



Description 



Locus Name 



sp:MODFJ±!<JoLl 



Acc# 



P31060 



fcfcOTEIN !>MkAJ 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



\±12&.5M....a±..M. 



\J7W 



TTTT 



TUT 



1.9e-12b 



Protein name 



Locus Name 



hypothetical protein dzu^I 



k>ir:H64i76 



Acc# 



H64976 



Description 



94 



ORF Name 



NTID 



AAID 



^2124B'A ci bl" 



5260 



— — Score Probability 

Length Length ~ 

9.4e-78 



Protein name 



Locus Name 



sp:MGH_K(JuLi 



Acc# 



P31217 



Description 



ORF Name 



NTID 



AAID 



^ — Score Probability 

Length Length 



l.le-52 



Protein name 



Locus Name 



meiitnase 



| gp:TBMh!LA 



Acc# 



Y08557 



Description 



T.ethanolicus meiA ana lacA genes. 



0 



ORF Name 



NTID 



AAID 



— — Score Probability 

Length Length 



[TOT" 



12ld 



Protein name 



Description 



Locus Name 



Acc# 



'0-HIT 



— — Score Probability 



ORF Name 



NTID 



AAID Length Length 



|15.6.Alb.:/.b.-..al...:/.i 



3TTTT 



TTT" 



O.OOib 



Protein name 



Locus Name 



cytochrome -c oxictase, cnam ill 



[pxr:^6yS4 



ACC# 



S36954 



Description 



95 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



165327S0 &z 104 



5264 



T7T" 



Protein name 



Locus Name 



F14N2:i.29 



lgp:AC0Ob4^y 



Acc# 



AC005489 



Description 

Genomic sequence tor Arabidopsis thaliana B AC g'14N2J tromunromosome 1, 
complete sequence. 



ORF Name 



NTID 



AAID 



1682842-7 rl b 



Protein name 



N utilization substance protein A 



Description 



NT Score Probability 

1.3e-57 



Length Length 



Locus Name 



pir:M72^13 



Acc# 



H-72213 



ORF Name 



19.5.15.126...±A...ba.. 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



44 



$266 



Length Length 
B1T5 " 



Score Probability 
1 . le-81 



WIG 



Locus Name 



sp:AB0X_cJVAPA 



Acc# 



P48255 



PROBAi^Lii! AT1> - L)L! JjiilNDBJsIT TkAMS POkTUk VUFlfa 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length 



IBT 



Protein name 



Locus Name 



Acc# 



Description 



INC -HIT" 



96 



ORF Name NT ID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Pr 


obability 


2054635_r2__34 46 


526B 


115 348 35.3 




3.4e-32 


Protein name 






Locus Name 




Acc# 


hypothetical protein £>o»66 


pir :B64825 




B64825 


Description 












ORF Name NT ID 


AAID 


NT 
Length 


AA 

„ — . , Score 
Length 


Probability 


2.D..7.M0.0.k„.al...6.^ 47 


5269 


553 1662 742 




z . ie- / jj 


Protein name 


Locus Name 




Acc# 


probable secreted alpha-gaiactosicLase 


pir:T36472 




T36472 


Description 












ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Pi 


-obability 


22.5.19.^^1.1^3. 4 8 


5270 


449 1350 15.32 




4.0e-157 


Protein name 


Locus Name 




Acc# 


L-tucose permease 


gp:A^l37263 




AF137263 


Description 














Bacteroides tlietaiotaomicron 


30S ribosomal protein Sl6-iiKeprotein, rucose 




gene cluster, and RNA polymerase sigma 


factorSigZ-like protein 


(sigZ) genes, 




complete cds . 














ORF Name NTID 


AAID 


NT 
Length 


AA 

— n Score 
Length 


Probability 


21$±1±&±^G±^1..... 49 


5271 


573 1722 2y7 






Protein name 


Locus Name 




Acc# 


receptor antigen ikagA) 


gp:PGT130aV2 




AJ130872 


Description 














£>orphyromonas gmgivalis W50 


receptor 


antigen 


(rag) locus encoamga majoi 




immunodominant 55kDa antigen. 















97 



• 



ORF Name 



NT ID 



AAID 



NT AA score Probability 

Length Length 



[23679512 cl bl 



PIT 



\5TTT 



5^5 



|7.7e-42~ 



Protein name 



Locus Name 



11BK outer memb rane protein precursor : tjumJ 
protein 



|pir:JC602V 



Acc# 



JC6027 



Description 




Protein name 
probable sigK protein 



Locus Name 



] |pir: F 708iO ~ 



ACC# 



F70830 



Description 




Protein name 
unknown " 



Locus Name 



] |gp:U96 771 



Acc# 
U96771 



Description 




ORF Name 



NT ID 



AAID 



2.48.D.Mb.:A...al^b.b.... 



5275" 



— — Score Probability 



Length Length 
TT5 



TTT 



Protein name 



Locus Name 



Acc# 



Description 



'O-HTT' 



98 



NT 



AA 



ORF Name 



NT ID 



AAID 



"FT" 



Length Length 
TTH — 



T77~ 



Score Probability 



Til 



Protein name 



Locus Name 



sp:TRHY_kABlT 



Acc# 



P37709 



Description 
TklCfiOH^ALlN 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 
1471 



Score Probability 
2.2e-07 



ITT5 



Protein name 



Description 



Locus Name 



sp:YMBC_ii!cJOLl 



Acc# 



P03843 



HYPOTHETICAL 16.8 KB PkuTElM IN MUriA-M ET¥ IN'l'EkGENic keuiuk 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



S71W 



Score Probability 
7.7e-ia6 



Protein name 



Description 



Locus Name 



Acc# 



sp:Y074jSYJvlY3 | Q55790 



NT 



AA 



ORF Name 



NTID 



ST 



AAID Length Length 
7¥5 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



ORF Name 



'3057«i26 ri 4 



Protein name 



NTID 



55" 



5280 



NT 



AA 



AAID Length Length 



Score Probability 



OUT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



[FT 



probable serine proteinase 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



B9 



T7TT 



7T 



Locus Name 



tpir:T^6bb2 



0.0077 



Acc# 



T36552 



ORF Name 



NTID 



NT AA 

— — Score Prob ability 
AAID Length Length 



m42<m..±2..J.i. 



TTTT 



WTT 



l.ie-40 



Protein name 



Description 



Locus Name 



sp:V076^VNVi 



ACC# 
Q55792 



HYPOTHHl'IOAL 50.0 KB PROTEIN SLftOoVb 



NT 



AA 



ORF Name 



3.M4RMZ...tl...8. J 



FT 



NTID AAID Length Length 



Score Probability 



TFT" 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



100 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|2.4e-86 



Protein name 



Locus Name 



Acc# 



P11549 



Description 

LACTALDSHVDE ftEPUci'l'AriB, (PROPANEDIOL o^lDo ftfiDtJCi'l'Aiifei) 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



3942813 13 47 



Z3 



TuT£~ 



ll.7e-103 



Protein name 



Locus Name 



1 L- tuculose-l-phospnate aldolase 



] [gp:Agl372^3 | 



ACC# 



AF137263 



Description 

Bacberoides bhetaxobaomicron 3l)S ribosomal protein ^16-HKeprotein, rucose 



gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


40.0.15.15.^al^.U 


64 


5286 


388 1167 


13S 


3.7e-06 



Protein name 



Locus Name 



transmembrane sensor 



lgp:AFublbyl 



ACC# 



AF051691 



Description 

Pseudomonas aeruginosa stress l actor A (pstA) , mj* sigma racbor [ txui ) , 
transmembrane sensor (fiuR) , and hydroxamate-typef errisiderophore receptor 
(fiuA) genes, complete cds . 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


±±$.6.0.0±.±±..Zb 


65 


S287 


488 


1467 1748 


;5.2e-180 



Protein name 



Locus Name 



L-tuculose Kinase 



1 |gp:Ali'i372bT- 



Acc# 



AF137263 



Description 



Sacberoides bhebaiotaomicron i0 5 ribosomal protein si6-HKeprotem, tucose 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 
complete cds . 



101 



NT 



AA 



ORF Name 



NT ID 



5157137 ±1 6 



AAID Length Length 

wzm — 



pur 



Score Probability 
T5F3 



1.2e-171 



Protein name 



Locus Name 



Initiation tactor IF2 -alpha 



gp:ECAJ2540 



Acc# 
AJ002540 



Description 



Escherichia coli (strain EcoAU9307) intB gene encodmgtranslational 
initiation factor IF2 . 



NT 



AA 



ORF Name 



15366453 ti S 



W7 



NTID AAID Length Length 




Score Probability 
TTUS 



5,5e-ll2 



Protein name 



Locus Name 



ni£S-liJce protein 



IgpTMZHrr 



Acc# 



Z98741 



Description 



Mycobacterium leprae cosmid B22 , 



U 

re;? 



C3 



NT 



AA 



ORF Name 



NTID 



6.2.5.ZD.3.3„..t2...2.:/. 



AAID Length Length 
5290 



f5T 



Score Probability 
233 



8.1e-22 



Protein name 



Locus Name 



L-tucose permease 



|gp:AF137263 



ACC# 



AF137263 



Description 



Bacteroides thetaiotaomicron 30S ribosomal protein S16 -likeprotein, fucose 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 
complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



iD..7.^2,QlD....al...6.1 



Length Length 



Score Probability 
7.3e-62 



533 



Protein name 



Locus Name 



IsprPFLDJSCMil 



Acc# 



P32674 



Description 

FORMATE ACtfrVL T RANSFERAS E 2, (PYRUVATE E0RMATE - L V A& E 2) 



102 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



i39I3a«7 c2 82 



7TT 



|2.7e-74 



Protein name 



Locus Name 



hypothetical protexn 



pir:V7^yb 



Acc# 



F72395 



Description 



NT 



AA 



ORF Name 



NTID 



iI5.6.^0.yj.:A...c3....10.B. I [7T 



AAID Length Length 




Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



72 



T5T 



3.9e-45 



Protein name 



Locus Name 



probable pyruvate rormate- lyase activating 
enzyme, pflG homolog 



pir : Afoy4.il 



Acc# 



A6 9431 



Description 



NT 



AA 



ORF Name 



NTID 



7T 



AAID Length Length 

— 



Score Probability 



7S - 



Protein name 



Description 



Locus Name 



Acc# 



INC -HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



7T 



Length Length 
TTS 1 fTTJF 



Score Probability 
l2.5e-20 



Protein name 



Locus Name 



probable competence protein ComF 



pir :F75402 



Acc# 



F75402 



Description 



103 



NT 



AA 



ORF Name 



NTID 



23437627 tl 1 



AAID Length Length 




Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HXT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



76 



T3W 



14107 



1.7e-40 



Protein name 



Locus Name 



bZIP histidme Kinase 



] |gp:P^UVlB2 Tr 



Acc# 



Y18245 



Description 



^seudomonas putida todX, tod* 1 , todcJl, todC2 , tools, todA, toau 
todl, todH, todS, todT genes. 



NT 



AA 



ORF Name 



3.0.S..7.3.7.6.1..±1...4.. 



T7 



NTID AAID Length Length 

fITU 



— Score Probability 



5299 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NTID 



— — Score Probability 



AAID Length Length 



7F" 



5300 



114$ 



Protein name 



Locus Name 



Acc# 



Description 
MO-HIT 



104 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



ri ib 



5301 



77T 



|2.7e-71 



Protein name 



Locus Name 



alpha gaiactosiaase precursor 



Acc# 



AF061331 



Description 



Saccharopolyspor a erythraea alpha gaiactosiciase precursor (melA)gene, 
complete cds . 



NT 



AA 



OkF Name 



NT ID 



AAID Length Length 



Score Probability 



3540877 cl b'l 



TO" 



|4.7e-l2 



Protein name 



Description 



Locus Name 



sp:YCHE_fiA(JtfU 



Acc# 



P94425 



HYPO T HETICAL 10 . s> KB J^koTiillJsJ IN. PHkO-cibH Ifl'l'UkcJIWTC kKc^loN 



NT 



AA 



ORF Name 



NT ID 



iD.M6.2.7...±1...17..... 



FT" 



AAID Length Length 




TuT7T 



Score Probability 
6.3e-88 



T7F 



Protein name 



Locus Name 



115K outer membrane protein precursor : Susc 
protein 



|pir:JC6027 



Acc# 



JC6027 



Description 



ORF Name 



|41&lb.D.7....cJ....luL. 



Protein name 



NTID 



AAID 



putatxve aldose l-epimerase 



Description 

Streptomyces coelicoior cosmicl 4A7 . 



— — Score Probability 
Length Length 



T5T 



1149 



Locus Name 



bp:SC4A7 



5.2e-84 



Acc# 



AL133423 



105 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



4103812 12 21 



83 



6.2e-33 



Protein name 



Locus Name 



Acc# 



sp:SUHB_iiJcJuLl 



Description 



ORF Name 



4422758 11 IB 



Protein name 



NTID 



84 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



KfO-HIT 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



— ' Score Probability 



TTJT 



5.5e-78 



Locus Name 



| sp:XYLE_BciOLl 



Acc# 



P09098 



D - XYL0^> E - PROTON fcjVMK)RTOk (b-X 7L0^E TRANSPORTER) 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length 



ll$.$All...r£..:3.a 



ft 



Protein name 



Locus Name 



Acc# 



Description 
MO-HIT 



106 



ORF Name 



10820130 cl 2±ti 



Protein name 



Description 



NO-HIT 



NT 



AA 



NTID 



AAIP Length Length 



Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



Length Length 
T£G 



Score Probability 



Locus Name 



Acc# 



[NO-HIT 



ORF Name 



Protein name 



unKnown 



Description 



NT 



AA 



NTID 



AAID 



89 



15311 



Length Length 



Score Probability 
8.9e-18 



Locus Name 



IgprAFmiM 



Acc# 



AF125164 



Bacberoides tragi iis 635R poiysacchariae b IPS B'Z) Diosyntnesisiocus , 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



imadB-ib^t^ni I m 



NTID AAID Length Length 

mv2 — 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



107 



ORF Name 



NTID 



AMD 



NT AA 

— — Score Probability 
Length Length 



14296883 ci 248 



3T~ 



TuTT 



TOT" 



2 . 5e-22 



Protein name 



Locus Name 



Acc# 



conserved hypothetical protein afovbi 



|pir:E6934V 



E69347 



Description 



ORF Name 



NTID 



NT AA 
— , — , Score 
AAID Length Length — 



15214 



7TT 



Probability 
|i.0e-78 



Protein name 



Description 



Locus Name 



bp:ECU89l66 



Acc# 



U89166 



Eikenella corrodens ■ lysine ' decarboxylase (ECORLD) gene, compieteccts . 



NT 



AA 



ORF Name 



NTID 



ST 



AAID Length Length 



T7TD" 



Score Probability 
5.5a-89. 



Protein name 



Locus Name 



single- strand DNA-specitic exonuclease 
homo log yrvE 



pir :H6yyyu 



Acc# 



H6998 0 



Description 



ORF Name 



NTID 



NT AA 
— — Score 
AAID Length Length — 



ST" 



1203 



Protein name 



Locus Name 



remn-JDincting protein-related protein : protein 
slrl975 rprotein slr!975 



|pir:£75649 



Description 



Probability 
|1.5e-68 



Acc# 



S75649 



108 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
C25B 



Score Probability 



TT5 



Locus Name 



Acc# 



Description 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



\±6£±6MJ±.±l..±9:i I IS5 



TTKT 



Protein name 



Locus Name 



coenzyme F3 90 synthetase (f tsA-3) homo log 



Description 



bir:D&950i 



2.3e-115 



Acc# 



D69501 



ORF Name 



NTID 



Protein name 



TT 



AAID 



NT 



AA 



Length Length 



Score Probability 



Locus Name 



Acc# 



% 1 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length — J ~ 



TIT 



1.4e-i9 



Locus Name 



Jsp:YHCG_ECOLI 



Acc# 



P45423 



Description 

HYPOTH E TICAL 43.3 KB EROTSIKT IN GLTF-NAlSfT INTERG ENTC RSGIOftt (O^Vhi) 



109 



NT 



AA 



ORF Name 



NTID 



AAID 



±$S2S6$'J> c3 316 



Length Length 

— I Kwn — 



Score Probability 
2.2e-94 



Protein name 



Locus Name 



putative epimerase/ dehydratase Wbil 



|gp:AF064070 



Acc# 



AF064070 



Description 



Burkhoideria pseudomallei putative dihyctroorotase (pyre) gene, partial cds; 
putative 1-acyl- sn-glycerol-3 -phosphateacyltransf erase (plsC) , putative 
diadenosine tetraphosphatase (apaH) , complete cds; type II O-antigen 
biosynthesis gene cluster , complete sequence; putative undecaprenyl 
phosphateN-acetylglucosaminyltransf erase, and putative 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length ~ A ~ 



)2&l±SB2...£2../±&& J (TTO 



101 



7T 



0.013 



Protein name 



Locus Name 



sp:NU3M_RAT 



ACC# 



P05506 



Description 
NA£)H - Ufe 1 QUI NONE OXlDOREDUCTAgE CKAW 3, 



NT 



AA 



ORF Name 



NTID 



12j£lJSajSjSjQj0...c2„ i 3jQ.4. 



101 



AAID Length Length 
5321 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



ORF Name 



NT 



AA 



NTID 



$ll£&l..G2 m 2te J lira 



AAID Length Length 
5124 — 



T3W 



Score Probability 
12 .-0e-i7 



3M 



Protein name 



Locus Name 



CapiiK 



gp:SAU73 3 74 



ACC# 



U73374 



Description 



Staphylococcus aureus type 8 capsule genes, cap8A, capBB , capBC, capbi), 
cap8E, cap8F, cap8G, cap8H, cap8I, cap8J, cap8K, cap8L,cap8M, cap8N, cap80, 
cap8P, complete cds. 



110 



NT 



AA 



ORF Name 



NTID 



AAID 



2117305 c3 317 



TUT 



Length Length 
— 



Score Probability 




2 .7e-56 



Protein name 



Locus Name 



otnA protein 



pir :S70958 



Acc# 



S70958 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


zi^asasz^ci.^^ 


.... 104 




5326 


965 


2898 


821 


l . oe-120 



Protein name 



Locus Name 



sp:YDIJ_li]C0Ll 



Acc# 



P77748 



Description 

HYPO T HE TI CAL 113.2 KB PROTE IN IN LPP-AROD INTERGENIC REGION 



ORF Name 



NTID 



NT AA ^ „ 
— , — , Score Probability 
AAID Length Length — ^ 



21£0.0.<m..±1...41.. 



Protein name 



Description 



Locus Name 



Acc# 



tsFO-HXT 



NT 



AA 



ORF Name 
2.2.3.45.Uaa...£.Z.. 



NTID 



AAID 



Length Length 
312 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



111 



NT 



AA 



ORF Name 



NTID 



AAID 



TUT 



m7T 



— , — _ Score Probability 
Length Length 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

- — , — , Score Probability 
Length Length A - 



23.5.ib.y.b.2..±2...10.b.. 



TUT 



war 



Protein name 



Description 



Locus Name 



Acc# 



NO -HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



\llZLll&2...kl..±Q2 1 [T^ 



Length Length 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



TTTT 



Length Length 



Score Probability 



T5*T 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



TTT 



Length Length 



— v, Score Probability 



T55~ 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



ORF Name 



NT ID 



AAID 



NT AA 

— , — , Score Probability 
Length Length J - 



TTT 



Protein name 



Locus Name 



indolepyruvate oxictoreductase , alpha suJDunxt 



Description 



pir :tibyil4 



V.Ve-83 



Acc# 



G69114 



ORF Name 



Protein name 

Description 
NO-HIT 



NT ID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



TTT 



Locus Name 



Acc# 



ORF Name 



Protein name 

Description 
XYLOSE REPRESSOR 



NT 



AA 



NTID 



AAID 



TT4 



Length Length 
[5TTT 



Score Probability 




1.0e-46 



Locus Name 



sp:XYLk_AHATH 



Acc# 



Q44406 



NT 



AA 



ORF Name 



NTID 



AAID 



aA40.2Ll.7..7...±2...1D.3. I [TTH - 



Length Length 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



113 



NT 



AA 



ORF Name 



NT ID 



AAID 



^4415903 ti 72 



Length Length 
1^- 



Protein name 



indolepyruvate terredoxin oxidoreductase, 
subunit beta ( iorB) homolog 



Description 



588 



Score Probability 
1.2e-27 



I3TU 



Locus Name 



pir :E6y503 



Acc# 
E69503 



ORF Name 



NT ID 



NT AA 

— - , — , Score Probability 
AAID Length Length • tL ~ 



TTT 



TIT 



3TT 



4.Se-94 



Protein name 



Locus Name 



WbpB 



gp:PAU50356 



Acc# 



U50396 



Description 



Pseudomonas aeruginosa Wzz (Rol) [wzz (rolj J gene, partxal cds,WPpA (wopB) , 
WbpB (wbpB) , WbpC (wbpC) , WbpD (wbpD) , WbpE (wbpE) ,Wzy (Rfc) (wzy (rfc) ) , 
Wzx (wzx) , HisH (hisH) , HisF (hisF) , WbpG(wbpG) , WbpH (wbpH) , Wbpl (wbpl) , 
WbpJ (wbpJ) , WbpK (wbpK) , WbpL(wbpL), WbpM (wbpM) and WbpN (wbpN) genes, 
complete cds f and UvrB (uvrB) gene , partial cds . 



ORF Name 



NT ID 



|2.4.45jfi5i2...jca...5ajS J [ITS 



Protein name 



AAID 



NT AA 

— , — , Score Probability 
Length Length • L - 



T7TT 



Locus Name 



Acc# 



Description 
MO-HIT 



ORF Name 



2.4.6.4.Q.a.7^....Cl...2.4^.. 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



TIT" 



TTTT 



i.3e-i97 



Locus Name 



putative aminotransferase 



gp:AF125164"'" 



Acc# 



AF125164 



Description 



Bacteroides tragilis 638R polysaccharide B (PS B2) biosyntnesislocus , 
complete sequence; and unknown genes. 



114 



• 



NT 



AA 



ORF Name 



NTID 



TZTT 



AAID Length Length 



1569 



Score Probability 
!i.le-25 



TUT 



Protein name 



Locus Name 



surtace antigen BspA 



pir :T3iuy4 



Acc# 



T31094 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 
2.4e-56 



Protein name 

Description 
METH I ONYL - TRNA FORMYLTRANSFERAjSE, 



Locus Name 



Acc# 



sp:PMT_BAUaU 



NT 



AA 



ORF Name 



NTID 



TIT 



AAID Length Length 



Score Probability 
|6.9e-il 



TFT" 



Protein name 



Locus Name 



unknown 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccharide biosynthesis operon, complete 
sequence. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



2.5A2.B.8.1.2....L3....XZX.. 



TIT 



341 



TTl 



Protein name 



Description 



Locus Name 



sp:Y973_METJA 



Acc# 



Q58383 



HYPOTHETICAL PROTEIN MJOdV:! 



115 



ORF Name 



NTID 



AAID 



NT AA n _ . , n , 
— , — , Score Probability 
Length Length 



TIT 



TTST 



TTT 



i.4e-i0 



Protein name 



Locus Name 



CapBT 



gp:SAU8 197:4 



Acc# 



U81973 



Description 



Staphylococcus aureus capsule gene cluster CapSA through CapBPgenes, 
complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



c3 36^ 



Length Length 



Score Probability 
ST7 



4 . Oe-55 



Protein name 



Locus Name 



chloride channel, probable, homo log 



pir :F69426 



Acc# 



F69426 



Description 



ORF Name 



NTID 



NT AA 
x — ^, T — ^. Score Probability 
AAID Length Length ^ 



2L&fc.7.&aa.7....aa...3t5i2.. 



T2T 



5348 



3T 



TTT 



|2.8e.-07 



Protein name 



Locus Name 



tacnylectln-3 



gp:AB017484 



Acc# 



AB017484 



Descri ption 



Tachypleus trictentatus mRNA tor tachylectm-3 , complete cds. 



NT 



AA 



ORF Name 



NTID 



259.7.6. 5.1Q...C.2....2.6.3... 



TTT 



AAID Length Length 

— 



311 



Score Probability 



TTTe^T" 



Protein name 



Locus Name 



Acc# 



gp:ECMPL 



X03345 



Description 

HT. coli npl gene tor N-acetylneurammate lyase summit (EC4 .1.3.3) . 



NT 



AA 



ORF Name 



NT ID 



^604635 c3 355 



T7W 



AAID Length Length 
FJSu 



T7TT 



TTTT 



Score Probability 
£73 



Protein name 



Locus Name 



unknown 



gp:AF144879 



Acc# 
AF144879 



Description 



Leptospira interrogans rtb locus, complete sequence. 



% J 

Ml 



NT 



AA 



ORF Name 



NT ID 



2$3053l3 cl 242 



AAID Length Length 
FTFT — 



1T7T 



Score Probability 
T22 



0 . 00028 



Protein name 



Locus Name 



putative polysaccnaricle polymerase 



ACC# 
U09239 



Description 



Streptococcus pneumoniae type 19F capsular polysaccnarxdebiosyntnesis 
operon, (cpsl9f ABCDEFGHIJKLMNO) ' genes, complete cds,and aliA gene, partial 
cds . 



NT 



AA 



ORF Name 



NT ID AAID Length Length 

130 



Score Probability 



T75~ 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NT ID AAID Length Length 

TJ1 



Score Probability 



1 m 



Protein name 

Description 
INO-HIT : 



Locus Name 



Acc# 



117 



NT 



AA 



ORF Name 



NTID 



cl 246 



TTT 



AAID Length Length 

m% — 



Score Probability 
TU2 



I.4e-05 



Protein name 



Locus Name 



DNA-JDinding protein HB 



pir:C75600 



Acc# 



C75600 



Description 



NT 



AA 



ORF Name 



NTID 



TJT 



AAID Length Length 




Score Probability 
E53 



6.5e-24 



Protein name 



Locus Name 



sp:YYBO_BACSU 



Acc# 



P37489 



Description 

HYPO T H ET ICAL 45,2 KB PROTE IN IN COTF-TETB INTERGENIC REGION 



ry 



n 

SIC 



NT 



AA 



ORF Name 



NTID 



AAID 



3.14I40.25....G2...2M I 



Length Length 



Score Probability 




2.7e~48 



Protein name 



Description 



Locus Name 



sp:3MGl_EC0LI 



Acc# 



P05100 



IT 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



[7T 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



118 



ORF Name 



NTID 



AAID 



NT AA 

— — , Score Probability 
Length Length • L - 



TUT 



2 .3e-07 



Protein name 



Locus Name 



unKnown 



gp:AF048749 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccharide biosynthesis operon, complete 
sequence . 



ORF Name 



52525512 ±1 17 



Protein name 



NTID 



TTT 



NT 



AA 



AAID Length Length 
T2T3S — 



Score Probability 



Locus Name 



Acc# 



Description 
MO-HIT 



ORF Name 



NTID 



1242.$.£l2L..±2....lll I \T3$ 



Protein name 



DNA repair protein RAD25 nomolog 



Description 



NT 



AA 



AAID Length Length 
S^u" — 



T7T" 



sirs* 



Score Probability 

t^t — 



i.le-18 



Locus Name 



pir :F69294 



Acc# 



F69294 



ORF Name 



NTID 



AAID 



TJT 



5361 



Protein name 



NT 



AA 



Length Length 
TT9 



Score Probability 



92 



Locus Name 



Acc# 



Description 



119 



ORF Name 



NTID 



337 c3 353 



HIT 



Protein name 



AAID 



acetyl trans t erase Jiomolog 



Description 



* 



NT 



AA 



Length Length 




Score Probability 




Locus Name 



pir:SV0673 



i.Ve^Bl 



Acc# 
S70673 



ORF Name 



Protein name 

Description 
INO-HIT 



NT 



AA 



NTID 



AAID 



Length Length 
7¥~" 



Score Probability 



Locus Name 



Acc# 



ORF Name 



3.£U6.b.3.;L2...Gl...Z3.8... 



Protein name 

Description 
PT^TTTT 



NTID 



AAID 



NT AA 

— — Score Probability 
Length Length JL 



TTT 



Locus Name 



Acc# 



ORF Name 



Protein name 



NTID 



NT AA 
— — Score 
AAID Length Length — 



Probability 
5.7e-07 



Locus Name 



hypothetical protexn 3 



pir :S28487 



Acc# 



S28487 



Description 



120 



NT 



AA 



ORF Name 



NTID 



341S75S7 c2 297 



144 



AAID Length Length 




1341 



Score Probability 
TTZZ — 



l.le-124 



Protein name 



Locus Name 



0RP1P 



gp:AB025970 



Acc# 



AB025970 



Description 



flesiomonas shigelloides gene tor ORPlf, 0ftf2S, 0ftP3& # 0RP41> f OkPb^, 
ORF7P , 0RF8P , 0RF9P, ORF10P, 0RF11P. 



& 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



36361063 c2 360 



T7TT 



TTTT 



2.0e-59 



Protein name 



Locus Name 



WbpH 



gp:PAt(50396 



Acc# 



U50396 



Description 



Pseudomonas aeruginosa Wzz (Rol) (wzz (rolj ) gene, partial cds,WbpA (wJopB) , 
WbpB (wbpB) , WbpC (wbpC) , WbpD (wbpD) , WbpE (wbpE),Wzy (Rfc) (wzy (rfc) ) , 
Wzx (wzx) , HisH (hisH) , HisF (hisF) , WbpG(wbpG), WbpH (wbpH) , Wbpl (wbpl) , 
WbpJ (wbpJ) , WbpK (wbpK) , WbpL (wbpL) , WbpM (wbpM) and WbpN (wbpN) genes, 
complete cds f and UvrB(uvrB) gene , partial cds . ; 



NT 



AA 



ORF Name 



NTID 



AAID 



I35ixdi5....c^..js.s jrras 



Length Length 
TSTT 



Score Probability 
T53 



5 . 4e-ll 



Protein name 



Locus Name 



serine O-acetyltransrerase , 



pir :E53402 



Acc# 



E53402 



Description 



ORF Name 



NTID 



J [IT7 



Protein name 



NT 



AA 



AAID Length Length 

stts — 



Score Probability 
|9.9e-122 



Locus Name 



bp:D.64132 



Description 

Porphyromonas gingivalis PorR and PorS genes, complete cds. 



Acc# 



D64132 



121 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length •* 



TTT 



TTTT 



7.1e-07 



Protein name 



Locus Name 



Acc# 



spiYCeejSCOLI 



Description 



NT 



AA 



ORF Name 



NTID 



14035155 cl 250 



AAID Length Length 



5T7T 



Score Probability 
535 



Protein name 



Locus Name 



rxbulose- 5 -phosphate 3-epimerase homoiog yloR 



pir :B69879 



ACC# 



B69879 



Description 



NT 



AA 



ORF Name 



NTID 



I5TT 



AAID Length Length 
5T72 



1047 



Score Probability 
P53 



9.0e-i>5 



Protein name 



Locus Name 



conserved hypothetical protein BB0709 



pir :D70188 



Acc# 



D70188 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



T5T 



5T7T" 



Length Length 



Score Probability 

-jtz — 



1.2e-31 



Protein name 

Description 
PUTATIVE JilNDONUCLEASE ££0411, 



Locus Name 



sp:NUC_BORBU 



Acc# 



051372 



122 



NT 



AA 



ORF Name 



NT ID 



AAID 



4103375 cl 243 



T5T 



Length Length 



TTJT 



Score Probability 
Ml 



14 .ie-84 



Protein name 



Locus Name 



putative transferase 



gp:BBR007747 



Acc# 



AJ007747 



Description 



Bordetella bronchiseptica cosmid BbLPSl. 



NT 



AA 



ORF Name 



NTID 



4457512 13 184 



T5T 



AAID Length Length 
— 



TTJuT" 



Score Probability 
TZ1 



'2 ,6e-.l6 



Protein name 



Locus Name 



conserved hypothetical protein MTH83 



pir :F69210 



Acc# 



F69210 



Description 



ORF Name 



NTID 



NT AA 
T — _ T — _ Score Probability 
AAID Length Length — ^ 



\±±S.9££A...ci2J±B.2 1 ITS^ 



wnr 



TUT 



9.0e-06 



Protein name 



Locus Name 



Acc# 



probable NADH-plastoquxnone oxidoreductase 
subunit 



Description 



|pxr:C7i018 



C71018 



ORF Name 



NTID 



NT AA 

_ _ _ _ ' — _ — Score Probability 
AAID Length Length 



47.2lS.3.S.5...±3....17.3. 



T5TT 



ST7T" 



72T 



OFTI 



Protein name 



Locus Name 



Acc# 



probable purine NTPase PAB0812 



Description 



|pir:F751u3 



F75103 



123 



NT 



AA 



ORF Name 



NT ID 



AAID 



4786250 r2 ^ 



Length Length 



Score Probability 
TJS 



|4.3e-09 



Protein name 



Locus Name 



hypotnetical protein MTH658 



pir :E6S?187 



Acc# 



E69187 



Description 



ORF Name 



NT ID 



AAID 



NT AA 
r — ^ T — . -i Score 
Length Length 



IT 



Protein name 

Description 
NO-HIT 



Locus Name 



Probability 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



5.3.5. St8A2....C.3....3.;LSL 



T5W 



Length Length 



Score Probability 



Protein name 

Description 
NO-HIT. 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



5£110A2...a±...lS± I IIB? 



Length Length 



Score Probability 
TJ1 



3.0e-05 



Protein name 

Description 
COM2 0£>E£0Kf £&0TEIN 3 



Locus Name 



sp:CME3_BA0SU 



Acc# 



P39695 



124 



NT 



AA 



ORF Name 



NTID 



5894001 ti 46 



T57T 



AAID Length Length 




1ST 



Score Probability 





0.020 



Protein name 
Description 



Locus Name 



sp:UDG aTkPY 



Acc# 
Q07172 



ORF Name 



NTID 



NT AA , , , „ , 
T — T — ^, Score Probab ility 
AAID Length Length JL 



c3 359 



161 



PIT 



Protein name 



Locus Name 



putative transferase 



gp:fiBR007747 



Acc# 
AJ007747 



Description 



Bordeteiia Joronchiseptica cosmid BbLPSl . 



NT 



AA 



ORF Name 



NTID 



6.ZS.a3.13....t2....1D.5... 



AAID Length Length 



3TT 



Score Probability 




6 . 5e-38 



Protein name 



Locus Name 



transposase 



gp:AF038866 



Acc# 



AF038866 



Description 



Bacteroides rragilis transposon Tn552 0 transposase (bipH) andmobilization 
protein BmpH (bmpH) genes, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



:/.a:L5i2„.£2...m I ITS? 



Length Length 



Score Probability 



2¥T 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



125 



• 



NT 



AA 



ORF Name 



NTID 



AAID 



962777 tl 73 



5386 



Length Length 



F7T 



Score Probability 




Protein name 



Description 



Locus Name 



'sp:XPTJBACSU 



Acc# 



P42085 



XANTHINE PHOgPHOfttBosiVL'l'feMaPfiRAgfi, 



ORF Name 



NTID 



AAID 



NT AA 
— — Score 
Length Length 



57" 



73" 



Probability 
10.013 



Protein name 



Locus Name 



sp:HBSJE>ANPO 



Acc# 



P04244 



Description 
HEMOGLOBIN BETA CHAIN 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



T72~ 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



ia7.3.5.2.27....a2L...13.Q... 



TF7~ 



AAID Length Length 




1008 



Score Probability 
3.2e-07 



Protein name 



Locus Name 



actinornodin polyketide dimerase- related 
protein 



pir :C72410 



ACC# 



C72410 



Description 



126 



ORF Name 



NT ID 



— — Score Probability 

AAID Length Length 



|i0757tfJ7_cl_lly 



H5T 



WTE 1 II 2 41? 



I3T5T" 



|1.4e-37 



Protein name 



Locus Name 
sp:VK.KO_bA(J^U 



Acc# 
P54442 




ORF Name 



NTID 



AAID 



^ — score Probability 

Length Length 



li&B3biiJ:l_20 | [16 9 | P^ 1 



175" 



10 .00^0 



Protein name 



Locus Name 
lsp:t4XDi_BkAk^ 



ACC# 
042370 



Description 
HOMkloBoil PROTEIN HoX- Di ^kAiiMElslT) 



NT 



AA 



ORF Name 



NTID 



AAID 



l£5ilb.U^t^l..... | 

Protein name 



Length Length 
[2T7 



Score Probability 



Locus Name 



Acc# 



Description 




EI — Score Probability 



ORF Name 



NTID 



AAID Length Length 



TTT 



tut 



i.4e-42 



Protein name 



Locus Name 



"putative flTP-bindxng protein 
Description 



] | gp:M'AcJuu4 ^ ] 



ACC# 



AC004786 



Arabxdopsxs thalxana chromosome ll BAU genomic sequence 
sequence . ^ __ 



127 



• 



• 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



14642187 t'l 61 



TTT 



raw 



T7TT 



b .2e-49 



Protein name 

Description 
HYPOTHETICAL MOTEIN HlOUfi 



Locus Name 



sp:Y318_HAEIN 



Acc# 



P43984 



ORF Name 



NTID 



NT AA 
T — ^ T — Score Probability 
AAID Length Length — iL 



1S125662 c$ 164 



173 



T2W 



1.6e-33 



Protein name 



Description 



Locus Name 



Acc# 



gp:£)90§37 



E.coll genomic DNA, Koiiara clone #34 7(44.2-44.5 min.) 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — 1 ~ 



I15.5.0,7..7.6.2...c3....16.a I [T7T 



Protein name 



Locus Name 



sp:YODE_KEAE 



Acc# 



Q01609 



Description 

HYPOTHETICAL 46.7 KD HftOTfilM IN 0£>r>E 3'ftfiflttMJ (OftM). 



NT 



AA 



ORF Name 



T7S 



NTID AAID Length Length 

— 



FIT 



Score Probability 
|2.4e-47 



Protein name 



Locus Name 



recR protein 



pir:H75547 



Acc# 



H75547 



Description 



128 



NT 



AA 



ORF Name 



205a77b3_ai_lbb 



NTID AAID Length Length 



[TT6T 



Score Probability 
|3.0e-8« 



Protein name 



Locus Name 
|sp:PATB_BAU^U 



Acc# 
Q08432 



Description 

■ putative AM^OTRAis rsFUkAibifci b/ 



ORF Name 



Protein name 



Description 



©-HIT 



NTID 



NT AA Score Probability 

AAID Length Length 



[T7T 



[£T!T 



Locus Name 



Acc# 



is;; 

.!!{ BSI. 



h :ss' 



ORF Name 



Protein name 



Description 
jNQ-MIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA gcore Probability 

Length Length - - 



T7F 



TUT 



Locus Name 



Acc# 



NTID 



NT AA 5 Core Probability 

AAID Length Length 

1.6e-li 



5401 



T7T 



T3T 



Locus Name 



Acc# 



P05332 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



T5TT 



"5402 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



— — Score Probability 
Length Length 

ET3 



AA 



ORF Name 



NTID 



AAID 



TBT~ 



7TT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT" 



ORF Name 



NTID 



^ ^ Score 

AAID Length Length 



2.3.MBJ.0.^c2^1iL). .q p2 



TUT 



Probability 
[4.7e-12 



Protein name 



Locus Name 



IsprRIHUJW'l'JA 



ACC# 



Q58085 



Description 

PU T ATIVE JUBoif'UWlM Blo^NTkklS ia 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



HT3~ 



TTTT 



Probability 
3.5e-lfe 



Protein name 

cation ettlux system (czciB-liKe) 



Locus Name 



] |pxr:li!V0i4^ 



Acc# 



E70342 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



T2T 



T7T 



Probability 

i.se-ia — 



Protein name 



Locus Name 



Ldoreductase, aldo/keto' reductase tamiiy 



pir : 



wmrrr 



Acc# 



H72307 



Description 



130 



ORF Name 



NT ID 



NT AA o , ( _ , 
_ — _ T — Score Probability 
AAID Length Length JL - 



TTT 



TTTTT 



Protein name 



Locus Name 



oxidoreductase , aldo/keto reductase family 



pir :H7^307 



Acc# 



H72307 



Description 



NT 



AA 



ORF Name 



NTID 



I3.23.Z:Z5.S....C21...13.Z.. 



135" 



AAID Length Length 

— 



Z1W 



Score Probability 
T7Z 



l . 3e~34 



Protein name 



Locus Name 



"plant -metabolite dehydrogenase homolog yvgN 



pir :C70040 



Acc# 



C70040 



Description 



NT 



AA 



ORF Name 



NTID 



3.3.3.5.21&.7....C1...120. I IIB7 



AAID Length Length 



Score Probability 
FITS" 



3 . 3e-59 



Protein name 



Locus Name 



oxidoreductase , aldo/keto reductase family 



Descri ption 



|pir:H723fl7 



Acc# 



H72307 



NT 



AA 



ORF Name 



Mm3.&7....c&...13.i I IITO 



NTID AAID Length Length 



Score Probability 



Protein name 
Description 

\m-niT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



ibiiiiaifi...ci.,.iii .1 \rw$ 



AAID Length Length 



Score Probability 
2.0e-75 



75T 



Protein name 



Locus Name 



oxidoreductase, aldo/keto. reductase family 



pir:A72308 



Acc# 



A72308 



Description 



131 



NT 



ORF Name 



NTID 



AAIP Length Length 



— Score Probability 



394857b &A VA6 



rrrr 



l.le-56 



Protein name 



Description 



Locus Name 
sp:YFOd_Ml!!TJA 



Acc# 



Q58903 



HYPOTHETICAL ABO 'I'RMriPukl'Jjk ATP-BlNUlNG PKUTmN MJibus 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 
8.2e-l3 


4064l78_tlJ>b 


191 


54l3 


455 


l3£& 


197 





Protein name 



Locus Name 



"aspartate ammotranst erase 



1 j9P sA *' 03blbT " 



Acc# 



AF035157 



Description 



Lactococcus lactis aspartate aminotra nsferase (aspu; gene, comp-Lececas . 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 
5.4e-^4 





192 


5414 


498 


14 9 7 


370 





Protein name 



'hypothetical protexn 



Locus Name 
1 |pir:SVbb8'/ 



Acc# 



S75887 



Description 



ORF Name 



NT 



AA 



NTID 



M10.1i.b...±Z Ii: 5A .., 



T5T 



AAID Length Length 



— Score Probability 



ITT" 



Protein name 
Description 



Locus Name 



Acc# 



IN0-H1T 



132 



ORF Name 



14486261 cl 121 



Protein name 

Description 
INO-HIT 



NTID 



AAID 



NT AA 

— L1 — , Score Probability 
Length Length 



TOT 



Locus Name 



Acc# 



ORF Name 



Protein name 



NTID 



AAID 



4saa&2J...±a...aa i irss 



ygge hypothetical protein 



Description 



NT 



AA 



Length Length 



FuTT 



Score Probability 
5 . le-22 



Locus Name 



pir:H72114 



Acc# 



H72114 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — ■ ^~ 



i.71b..7.1£5....cl...lD.a I 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



6!2.&±1.±1J12 1 fT57 



Length Length 

m$ — 



Score Probability 



82 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



542 0 



Length Length 

Tn — 



Score Probability 



TTET 



Protein name 

Description 
(MO-HIT " 



Locus Name 



Acc# 



133 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
ST 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



ni4a45i..±i„.m i 



Length Length 
TIT 



Score Probability 
126 



Protein name 



Locus Name 



hypothetical protein yngA 



prr :F6yay^ 



Acc# 



F69892 



Description 



NT 



AA 



ORF Name 



imia&2 i .±a...i3A.... I Put 



NTID AAID Length Length 

pszs — 



Score Probability 
2570 



Protein name 



Locus Name 



hypothetical protein mexF 



p.ir :T308l*0 



Acc# 



T30830 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



ll£fii2L2...c3L...23.4 IP 



Protein name 



Locus Name 



ct469 hypothetical protein 



pir :D72060 



ACC# 



D72060 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
TUT 



Score Probability 



WIT 



Protein name 



Description 
UslO-HTT 



Locus Name 



Acc# 



134 





ORF Name 


NTID 


AAID 


NT AA 

— Score 
Length Length 


Probability 




12712827__rJ_l2U 


204 


5426 


T51 1052 276 


5.0e-24 






Protein name 






Locus Name 


jc\{^\-* it 




i 


conserved hypothetical protein 


| pir:F72^b 


F72386 






Description 












ORF Name 


NTID 


AAID 


NT AA 

^ •*•„ — Score 

Length Length 


Probability 
3.0e-06 




12$.222ti:2..±±..±Lb. 


.. 205 


5427 


3TS 951 132 








Protein name 






T none! "NT 23 n"i O 
LOCUS JNelllifcS 


Acc# 




1 


hypothetical protein aq_380 




| pir:A70334 


A70334 






Description 










ill ffii' 

\5 


ORF Name 


NTID 


AAID 


NT AA 
— — , Score 
Length Length 


Probability 




XllOA6A..±l..±L& 


|206 


5428 


S3 - ■ 282 




!!} flfli- 




Protein name 






Locus Name 


Acc# 


O 




Description 
















HIS 




ORF Name 


NTID 


AAID 


^ ^ Score 
Length Length 


Probability 
8.9e-34 


1; ^ . 


ll&l&AZ^t'2.J.U 


... 207 


5429 


T33 405 368 








Protein name 






Locus Name 


Acc# 












sp:YYAH_BA<JtW 


P37516 






Description 














HYPOTHETICAL 14.4 




IN TETB- 


-EXOA INTki^ENiC kUcJIuU 


(ORPP) | 






ORF Name 


NTID 


AAID 


NT AA 
— — , Score 
Length Length 


Probability 






±k5MAlb....z±..±M 


.... 208 


5430 


235 711 166 








Protein name 






Locus Name 


Acc# 






hypotneticai prot 


ein MTMyiy™ 




| |pir:<36^2b 


1 G69225 



Description 



135 



NT 



AA 



ORF Name 



NTID 



146469bb c2 Abl 



AAID Length Length 




Score Probability 



MIT 



ST 



0,023™ 



Protein name 



Locus Name 



mannanase 



gp:U967Vl 



Acc# 



U96771 



Description 



frrevotella bryan tii putative polygalacturonase , B~l, 4- enaogiucanase, 
mannanase genes, complete cds ; and unknowngenes 



and 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


i470S261_ti_jL5 


210 


5432 


501 


1506 


115 


0.00052 


Protein name 








Locus 


Name 


Acc# 



unknown protein 



Description 



Bacillus subtilis icione pED4> comu- 
operon, complete cds. 


(1,2, 4,4,5, 6, and 


7) proteins 


mcomG 




ORF Name NTID 


AAID 


NT AA 
Length Length 


Score Pr 


obability 


I47.MD.li)...±1...12y. 211 


5433 


" 265 -798 


275 


6.3e-24 



Protein name 



Locus Name 



conserved hypothetical protein y^KA 



pir :E6y«bI 



Acc# 



E69851 



Description 









NT 


AA 




Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 










±±bi6:xi±.±2..m 


212 


5434 


657 


2004 




144.5 


l.le-147 


Protein name 








Locus 


Name 




Acc# 



" DNA ligase 



gp:BST01I676 



Description 

Bacillus stearothermopnilus ng gene. 



136 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



15641^02 ci 288 



2TT 



5435 



Score Probability 
I2.4e-31 



Protein name 



Locus Name 



conserved hypothetical protein AF^ui 
Description 



] |P ir:A69b2!r 



Acc# 



A69525 



NT 



AA 



ORF Name 



NT-ID 



AAID Length Length 



T3T5" 



Score Probability 
13 .le-06 



Protein name 



hypothetical protexn AKibbv 



Locus Name 
] |pir:1^48i 



Acc# 



B69483 



Description 



NT 



AA 



ORF Name 



NT ID 



AAID 



l&Bl&lbzjLiJill | 



Length Length 
■JT75" 



Score Probability 
|1.2e-7S 



Protein name 



Description 



Locus Name 



lsp:PYkD_AyUAli! 



Acc# 



066461 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— , Score 



TTZF 



Probability 
i.8e-177 



Protein name 



Locus Name 



nypotnetical protein 



ACC# 
JQ1020 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



=3 



Score Probability 
|3.5e-34 



Protein name 



Locus Name 



conserved hypothetical protein AF1B78 
Description 



pir :£!6y4^4 



Acc# 



E69484 



137 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
TT9 



Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



■O-HIT 



ORF Name 



NTID 



AAID 



^ ^ Score 
Length Length 



Probability 



[5441 



8.i>e-20 



Protexn name 



Description 



Locus Name 



Acc# 



P54459 



HYPO'l'^l'lCAL 4o.b Kb PftO'l'm* lfl COMacJ-R^T lOTfikcjlWlti keloid 



W 
s i 



sis' 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



7TT 



Score Probability 
|2..5e-2:i 



T71T 



Protein name 

conserved hypotneticai protein aq_u«b 



Locus Name 
pir:t'7042U 



Acc# 



F704.20 



Description 



^ — score Probability 

Length Length ~ 

\TI^> 



AA 



ORF Name 



NTID 



AAID 



fltimiinnE: 



rzrr 



5443 



1% 



Protein name 
Description 



Locus Name 



Acc# 



MO-HIT 



138 



ORF Name 



215ibb6 tl 2V 



Protein name 



NTID 



TIT 



NT 



AAID Length Length 



AA 

— Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



TIT 



5445 



NT 



AA 



AAID Length Length 
TT£ 



— Score Probability 



61 



Locus Name 



Acc# 



Description 
[NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



TTT 



conserved nypotnetical protein 



Description 



— — Score Probability 
Length Length 

TFT" 



Locus Name 



BTrTTTT^W 



1.3e-16 



Acc# 



E72209 



ORF Name 



Protein name 



NTID 



AAID 



— — s core Probability 

Length Length 



TTT 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



TIZT 



NT 



AA 



AAID Length Length 



Score Probability 



IZE 1 [7W 



Locus Name 



Acc# 



Description 



IN0-H1T 



139 



ORF Name 


NT ID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


23633312__tl_7 


227 5445 


248 


747 193 






Protein name 






Locus Name 




Acc# 








gp:APU7223^ 




U72238 


U tit? -Lf ^- | - ( «- J - LJ ' 














Anabaena ee<JVl20 


"OEFftl, Oitffci, okPki, 




and OKFR5 genes, 


com 


plete 




sequences . 














ORF Name 


NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


236358l2J:l_y 


T28 S450 


784 


23SS 148 







Protein name 



Locus Name 



conserved hypotneticai protein AF10I7 



bir:A6W77 



Acc# 



A6 937 7 



Description 



ORF Name 


NTID 


AAID 


NT AA 
— • — , Score 
Length Length 


Pr 


obability 


215£Ab.L'L.±±..Ab. 


229 


- 5451 


131 396 136 




4 . le-08 


Protein name 






Locus Name 




Acc# 
U73653 




bs Kua protein 






| gp:MBU736b3 






Description 














Mycobacterium jdovis 


63 kDa 


protein, 


47 kDa protein ana cipjs-gene 


complete 




cds . 














ORF Name 


NTID 


AAID 


NT AA 

— Score 
Length Length 


Probability 




2±15£SAl.±l..±±b. ... 


230 


5452 


383 1152 









Protein name 



Locus Name 



Acc# 



Description 



NO -HIT 



140 



ORF Name 



24316061 tl b 



Protein name 



Description 



NT 



AA 



NTID 



TTT 



AAID Length Length 
[STT- 



Score Probability 



Locus Name 



Acc# 



WO -HIT 



ORF Name 



243.Mly.A...cl...lb t l.. 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



TTT 



Length Length 
TT7 



Score Probability 



3ff 



Locus Name 



Acc# 



IN0-H1T 



ORF Name 



Protein name 



Description 
NO-HIT 



NTID 



AAID 



NT AA 
Length Length 



Score Probability 



TIT 



TWE~ 



Locus Name 



Acc# 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



TIT 



TTTT 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



2MM0.i:A^al^lb.y... 



Protein name 



NTID 



AAID 



TI*T 



hypothetical protein jnpuby4 
Description 



— ' AA : ■ Scor e Probability 
Length Length 

8 .9e-8S> 



Locus Name 



pir :F7iyui 



Acc# 



F71901 



141 



ORP Name 



NT ID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ** 



24500032 U 121 



1839 



7.7e-154 



Protein name 
Description 



Locus Name 



sp : SYD_BACSU 



ACC# 



032038 



NT 



AA 



ORF Name 



NT ID 



24642760 C3 311 



AAID Length Length 
5455 



Score Probability 



1.2e-205 



Protein name 



Locus Name 



L-tucose isomerase 



lgp:AF13726:i 



Acc# 



AF137263 



Description 



Bacteroides thetaiotaomicron 3 OS rioosomal protein S16 -likeprotexn, tucose 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 
complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



24.7.26.5. 5.a...£2...6.1.. 



Length Length 

or 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



NT AA 

— , — Score Probability 
AAID Length Length Jl - 



TUT 



¥^5~ 



F8~ 



0.0i!i 



Protein name 



Locus Name 



ATP synthase FO, suiDunit b ' 



pir :A64662 



ACC# 



A64662 



Description 



142 



ORF Name 



■24S97y4J ±1 12 



Protein name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length 



FT 



T5B" 



Locus Name 



Acc# 



Description 



INO-HiT 



ORF Name 



Protein name 



Description 



NTID 



— — Score Probability 
AAID Length Length 



2T7~ 



8.5e-18 



Locus Name 



|gp:AB024bb3 



Acc# 



AB024563 



bacillus halodurans gene ior tttflL, VF1M, YHd4, HMp and ARGE, complete 

cds . 



ORF Name 



Protein name 



NT 



AA 



NTID AAID Length Length 

TZZ 



Score Probability 



FT" 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— — Score Probability 
Length Length 



7^5" 



lW 



0.0015 



Locus Name 



sensory transduction system regulatory 
protein slrl837 :protein slri837 :protein 
fl1-rl837 - 



pir:S77341 



Acc# 



S77341 



Description 



143 



NT 



AA 



ORF Name 



26578375 ti 100 



NTID AAID Length Length 

— 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



27.415.41.±2..„111.. 



2T5~ 



Length Length 

rar? — 



Score Probability 
10.0053 



TXT 



Protein name 
Description 

P&ETOOTEltf TRMSLOCASE SEOT SUMMIT 



Locus Name 



sp:SECY_ANT3P 



Acc# 



Q37143 



NT 



AA 



ORF Name 



22^ 



NTID AAID Length Length 

5T£3 



S5T 



Score Probability 

— 



3 . 8e-26 



Protein name 



Locus Name 



XylR 



|gp:BSUi59S5 



Acc# 



U15985 



Description 



Bacillus stearothermopniius encLo-beta-1, 4-xyianase (xynAj gene , complete 
cds. 



NT 



AA 



ORF Name 



NTID 



2$.3AA65.2...±±...ll t ... I I2T7 



AAID Length Length 



5469 



Score Probability 
i.le-40, 



4T3 



Protein name 



Locus Name 



sp : PYRZ_BACSU 



Acc# 



P25983 



Description 

filfflffiftOOROTA'TE! M^mOc^'NA^k' ELECTRON TftAMSl-'Eft. StffitJNIT 



144 



NT 



AA 



ORF Name 



NTID 



AAID 



25537532 r3 117 



[5T7TT 



Length Length 



Score Probability 




9.^e~0 7 



Protein name 



Locus Name 



hypothetical protein Rv28l6c 



pxr :C706yi 



Acc# 
C70691 



Description 



ORF Name 



NTID 



NT AA 

_ _ _ _ — _ _ — Score Probability 
AAID Length Length ^ 



^3.2.a5iQ13....al...X5.7.., 



5T7T" 



7¥T 



6 ,8e-5S 



Protein name 



Description 



Locus Name 



sp : TRMDJBACSU 



Acc# 



031741 



METHYLTRANSFERA2E } 



ORF Name 



NTID 



NT AA 
Y — T — Score Probability 
AAID Length Length • L - 



MD.1S£7.7..±1..111 I I^TT 



5T7T 



TTT 



7.0e^l8 



Protein name 



Locus Name 



hypothetical protein CTS^8 



|pir:A7151^ 



Acc# 



A71519 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



M0.S.a:76.0...±1...2S. I P5T 



Length Length 

tss — rprass — 



Score Probability 
5^ 



|2.9e-58 



Protein name 



Locus Name 



conserved hypotnetical protein yqto 



pir :A69954 



ACC# 



A69954 



Description 



ORF Name 



NTID 



AAID 



Protein name 

Description 
IMO-HIT 



NT AA o ^ ^ . . _ . ^ 
T — J- . — , Score Probability 
Length Length 



63 



T5T 



Locus Name 



Acc# 



145 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



1191 



Score Probability 
|3.2e-77 



7TF 



Protein name 



"hypothetical protein HPou4y 



Locus Name 
pir :A64b^b 



ACC# 



A64526 



Description 



ORF Name 



NTID 



AAID 



NT AA ScQre 
Length Length 



1 [2TT£ 



fTTT 



Probability 
|2.9e-09 



Protein name 
Description 

HYPOTHETICAL Ab<J Tk ANSPOkTllk 



Locus Name 
Isp: YkJkMi!<JoLl 



Acc# 



P75831 



A T P-BIND1NG PROTEI N YBJ^' 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



5477 



Probability 
1.2e-79 



Protein name 



Description 



Locus Name 



sp:B10JM*A<JfcW 



Acc# 



P22806 



LI^E) 



— — Score Probability 
Length Length 





AA 



ORF Name 



NTID 



AAID 



5T7¥" 



Protein name 



Description 



Locus Name 



Acc# 



IN0-H1T" 



146 



ORF Name 



Protein name 



NT ID 



amp nucleosiaase 



Description 



— — Score Probability 



AAID Length Length 
7W5 



FTTT 



Locus Name 



pir :A72u2i~ 



Acc# 



A72021 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 
TT^ 



Score Probability 



[TOT 



Locus Name 



Acc# 



Description 



NO-HIT 



1/ «i f 

t hi 



ORF Name 



Protein name 



OprM 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



W7T 



1416 



1. le-49 



Locus Name 



|gp:AB0113^i 



Acc# 



AB011381 



£seudomonas aeruginosa gene tor OprM, complete cas. 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length ™ 



3.M213..7„.±2...b.i.. 



5482 



Protein name 



Locus Name 



Acc# 



Description 



MO-HIT? 



147 



NT 



AA 



ORF Name 



NT ID 



'394S962 cl 165 



AAID Length Length 

— 



Score Probability 



Protein name 

Description 
ETCPHTT 



Locus Name 



Acc# 



ORF Name 



NT ID 



NT AA 

^ , ^ — . — . Score Pro bability 
AAID Length Length z ~ 



iaS.S6AO....c2...2b.O. I YZ&I 



T5T 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NT ID 



&XQ33.B3....Z2....2.Q.X.. 



AAID Length Length 
— 



TIF" 



T7F" 



Score Probability 
164, 



i.7e-12 



Protein name 



Locus Name 



Acc# 



sp:Ym^ECOLI 



Description 

HYPOTHETICAL 14 . 1 KE> PROTEIN IN NFNB-ENTD INTERGENIC REGION-- 



ORF Name 



NTID 



AAID 



NT AA 
T T — Score Probabilxty 
Length Length 



74T" 



I.3e-14 



Protein name 

Description 
HYPOTHETICAL PROTEIN 



Locus Name 



sp:YS78__METJA 



Acc# 
Q58388 



148 



NT 



AA 



ORF Name 



NT ID 




AAID Length Length 



Score Probability 



i.0e-4i 



Protein name 



Locus Name 



hypothetical protein mexJJ 



pir :T30«2y 



Acc# 



T30829 



Description 









NT 


AA 


Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 








45.11bJ.b..,±J....Il^ 


266 


54S8 


412 


123^ 


150 




7.0e-I0 


Protein name 








Locus 


Name 




ACC# 



IgpTYFTuTEm" 



AL031866 



Description 

Yersinia pestis 102 kbases unstable region: rrom 1 to ny443. 



NT 



AA 



ORF Name 



NTID 



45.35.6.7.:/....cl...ib.4.. 



AAID Length Length 



Score Probability 
|2.5e-^0 



2TT 



Protein name 



Locus Name 



sp:CATi_ii!cJuLl 



Acc# 



P00484 



Description 

CHLORAMPHENICOL AcJE TVLTkAMriPEkAiaL! Ill, 



ORF Name 



NTID 



NT AA 
— — , Score 
AAID Length Length 



Protein name 



Description 



Locus Name 



Probability 



Acc# 



MO- HIT 



149 



EE Score Probability 



ORF Name 



NT ID 



AAID Length Length 



7T" 



0.020 



Protein name 

NADH dehydrogenase suDunit 4L 



Locus Name 
gpsBMMri'uUHul 



Acc# 



AF110610 



r partial cas; 



Description 

Boophilus microplus NAiJh dehyd rogenase summit 4 iND4j gene, part u 
NADH dehydrogenase subunit 4L (ND4L) gene, completecds; tRNA-Thr and 
tRNA- Pro genes, complete sequence; and NADHdehydrogenase subunit 6 (ND6) 
gene, partial cds, mitochondrialgenes for mitochondrial products. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4SS7087jtl_iy 


210 


S45>2 


139 


420 






Protein name 








Locus 


Name 


Acc# 


Description 














NO-HIT | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4aaaai^»±i...ii 


271 


S4S3 


74$ 


2250 


156 


§.6e-06 



Protein name 



Description 



Locus Name 



sp:V7S7_MtlTJA" 



Acc# 



Q58207 



HYPO T HETICAL PkoTR IN MJOVaV 



ORF Name NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


mX7.6.5...±3....1M 5494 


141 426 




- l.6e-uv 


Protein name 


Locus Name 


Acc# 


conserved hypothetical protein yKnz, 


pir:E6dttb« 


E69858 



Description 



150 



NT 



AA 



ORF Name 



NTID 



50843&1 c3 rib 



AAID Length Length 
5495 



T3T 



Score Probability 



Protein name 



Locus Name 



FucR 



|gp:AF127263 



Acc# 



AF137263 



Description 



Bacterordes thetaiotaomicron 30S ribosomal protexn S16 -IxKeprotein, tucose 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 
complete cds . 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — J - 



S1S7812 il 13 



T7T 



543S" 



Protein name 



Locus Name 



sp.:YF08JMETJA 



Acc# 



Q58903 



Description 

HYPOTHETICAL ABC TRANSPORTER ATP -BINDING PkO'i'kJIN MJI50S 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
5497 



TOT" 



Score Probability 



7 . be-21 



Protein name 



Locus Name 



amino acid ABC transporter, ATP-binding 
protein 



|pir:H72356 



Acc# 



H72356 



Description 



ORF Name 



Protein name 



gpc 



NT 



AA 



NTID 



AAID 



Length Length 
T7W 



Score Probability 



Locus Name 



Acc# 



gp:AF063097 



Description 

Bacteriophage P2, complete genome . 



ORF Name 



NTID 



AAID 



— , ^ — , Score Probability 
Length Length J - 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



6ALZB.XZ...CZ...iay. I 



AAID Length Length 
bbOO 



2061 



Score Probability 
8 . Ve-233 



Protein name 



Locus Name 



nign temperature protein HtpG 



gp:AF17624b 



ACC# 



AF176245 



Description 



Porphyromonas gingival is high temperature protein HtpG (htpG) gene, complete 
cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



TUT 



Protein name 



Description 



Locus Name 



Acc# 



IHO-HTT 



NT 



AA 



ORF Name 



NTID 



.7.a.7.5.5..7...±3....1i3...... I 



AAID Length Length 
J5U2 — 



JUT 



Score Probability 



Protein name 



Locus Name 



dihyctroclipi col mate synthase 



pir:B7224S 



Acc# 



B72246 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



507574 cl 14y" 



1877 



l.le-193 



Protein name 



Locus Name 



Acc# 



P37571 



Description 

NEGATIVE ftfigtHAl^k OF tiKNmO (joMPiii'riWUd Midi's 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


585l505_t3_li:i 


- 282 


5504 


236 


717 


105 


1.5e-0S 



Protein name 



Locus Name 



hypothetical protein 



gp;SEL24370V 



Acc# 



AJ243707 



Description 

Synechococcus ei ongatus petB gene, petu gene anci ORfr'i . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1WT 



5505 



T7T" 



7TT 



g.le-74 



Protein name 



Locus Name 



ATP synthase K'i, summit aipna 



[piri FV imi 



Acc# 



F72231 



Description 



NT 



— Score Probability 



ORF Name 



NTID 



AAID Length Length 



l3.3.5^.2...±i....iib.. 



T84 



F5uT" 



2^ 



1.2e-b4 



Protein name 



Locus Name 



nypotneticai protein 



lgp:^1^242827 



ACC# 



AJ242827 



Description 

Streptomyces tendae aip gene a nd Okh l 2 (partial), strain Tue^ul/ac. 



ORF Name 



Protein name 

Description 
INO-HIT 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



TO - 



Locus Name 



Acc# 



ORF Name 



NT 



AA 



NTID 



|I45ttim...aa...iIi I ETC 



AAID Length Length 

— 



Score Probability 
6.6e~45 — 



FT7T 



Protein name 



Locus Name 



conserved hypothetical integral membrane 
protein HP1184 



Description 



pxr:H64667 



Acc# 



H64667 



ORF Name 



145.110.0..7...±1...2... 



Protein name 



Description 



INC- HIT 



NT 



AA 



NTID 



AAID 



5509 



Length Length 
TJTT 



Score Probability 



Locus Name 



Acc# 



ORF Name 



NT 



AA 



NTID 



AAID 



\±A5ABA11.±1..±1B. I ISTO 



5510 



Length Length 
— 



TFT 



Score Probability 
0 . 0015 



TT7 



Protein name 



Locus Name 



conserved, hypothetical protein yknZ 



pir :E6-y8 58 



ACC# 



E69858 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
55TI — 



Score Probability 




Protein name 



Locus Name 



antibiotic resistance protein homolog ywoG 



pir:H7o065 



Description 



|4.Se-l7 



Acc# 



B70065 



154 



ORF Name 



NT ID 



NT AA „ ^ ^ , . , ^ 
— — Score Probabilxty 
AAID Length Length - J ~ 



15104137 ti 1 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



% J 

31 ICS! 



ORF Name 



NT ID 



NT AA 
T — _ T — ^, Score Probability 
AAID Length Length JL 



\15£.12All...al...l21 1 [25T 



|1.0e-78 



Protein name 



Locus Name 



Salmonella typnimurium transcriptional 



gp:STYSTMPl 



Acc# 



AF170176 



Description 



Salmonella typnimurium tragment STMF1 . 



NT 



AA 



ORF Name 



NTID 



73T" 



AAID Length Length 
F5T3 — 



41)9 



TOT 



Score Probability 
l.Se-9S 



Protein name 
Description 

PROBABLE URACIL PERMEASE (URACIL TRANSPORTER) 



Locus Name 



sprURAAJIAEIN 



Acc# 



P45117 



NT 



AA 



ORF Name 



NTID 



AAID 



\±6±116.&2....t±..A5. I 12^3" 



Length Length 
TUT 



Score Probability 
3 .4e-78 



537 



Protein name 

Description 
AT£> SYNTHASE ALPHA CHAIN, 



Locus Name 



sp:ATPA_RI<!l>ft 



Acc# 



050288 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



16828462 tl_4b 



TFT 



Probability 
1 .4e-44 



Protein name 



Description 



Locus Name 



|sp:ATPG_BACtlU 



Acc# 
P37810 



ORF Name 

c^i Ab4 



NTID 



AAID 



NT AA 
— — , S core 

Length Length 



1755" 



Probability 
12 ,0e-05 



Protein name 



Locus Name 



3 ' , 5 1 -cyclic-nucleoticte pnospnoaxesterase , 
cpdA homolog MTH178:Icc related protein 



bir:t l 6yiu4 



ACC# 



F69104 



Description 



NT 



AA 



ORF Name 



NTID 



2.1S.3.0.y.la...cl^l.A. 



AAID Length Length 

toi — 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO -HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



12&6A±8.1^a±^±^ | 



ITT 



Length Length 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



Heic 



Description 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



5520 



TFT 



Locus Name 



gp:Ll>Uil 7 04 



Legionella pneumopnila HelC (helC) gene, complete cds . 



a.5e-0S 



Acc# 



U11704 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— , — , Score Probabi lity- 
Length Length • 



TTT 



Locus Name 



y . 5e-l4 



Acc# 



Description 



sp:GS INHUMAN 



Q08623 



GSi PROTEIN 



ORF Name 



Protein name 



NTID 



TOO" 



transcription regulator, crp tamiiy 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



pir : F722B5 



Acc# 



F72285 



ORF Name 



NTID 



AAID 



2L4fiLflLasia...ci...i&i I nnn: 



Protein name 



NT 



AA 



Length Length 
73— 



Score Probability 



2TT 



Locus Name 



Acc# 



Description 
IMO-HIT 



157 



NT 



AA 



ORF Name 



NTID 



AAID 



TUT 



Length Length 
TTT 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



[MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



5i4iin.7.R..±i...iiD. I vnn 



Length Length 
TU1 1 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



M2iaD.3.2..±2...9.3. I [JUT 



AAID Length Length 
273 



Score Probability 
TTT — 



■5.2e-I3 



Protein name 



Description 



Locus Name 



sp:ATPL_ANAS£> 



Acc# 



P12409 



ATP SVNTHASK C CMAIN, (LIPID-BINDING PR0TJ3IN) 



NT 



AA 



ORF Name 



NTID 



\1&5ALX1.±1.A2. I ETO 



AAID Length Length 
— 



TTT 



Ti4ir 



Score Probability 
¥STT 



1.2e-45 



Protein name 



Locus Name 



sensory transduction system regulatory 
protein slll229 :protein slll229 :protein 
sl-11229 



pxr :S75524~ 



Acc# 



S75524 



Description 



158 



NT 



AA 



ORF Name 



NTID 



24333127 c3 332 



AAID Length Length 




Score Probability 
7T3 



1.3e-76 



Protein name 



Locus Name 



Acc# 



P31473 



Description 

HYPOTHETICAL 56.4 tib tf&OTBIN IN ASWA-RUP UrfTfiRSfiKfie REGION 



ORF Name 



NTID 



AAID 



24356417 c2 250 



TOT 



Protein name 



hypothetical protein jhp0336 



Description 



NT 



AA 



Length Length 
2991 



Score Probability 
!W 



Locus Name 



pir :C71944 



3.4e-l5 



Acc# 



C71944 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — i - 



2ll9.9.0.15....al...l±2 1 BUS 



TST" 



0.049 



Protein name 



Locus Name 



nonstructural protein 



|gp:AF012732 



Acc# 



AF012732 



Description 



Bovine viral diarrhea virus strain YaJc nonstructural protein (pi 2 5 J mRNA, 
partial cds . 



ORF Name 



NTID 



AAID 



NT AA 

— ^ — ^ Score Prob ability 
Length Length — 



I pus 



TT2 1 ITST 



7.3e-23 



Protein name 

Description 
THIOREDOXIM (TRX) 



Locus Name 



sp:THIO_BORBU 



Acc# 
051088 



159 



ORF Name 



NT ID 



NT AA 

— — , Score Probability 
AAID Length Length z ~ 



24414153 ±1 20 



1347 



0.0040 



Protein name 



Locus Name 



unKnown 



gp:U96771 



Acc# 



U96771 



Description 



Prevotella Bryant i± putative polygalacturonase, B-l, 4-endoglucanase , and 
mannanase genes, complete cds; and unknowngenes . 



ORF Name 



NTID 



AAID 



244S54S2 c2 26$ 



Protein name 



long-chain-tatty-acid CoA ligase 



Description 



NT AA o . i_ ■ I • ±_ 
— , T — , Score Probability 
Length Length ^ 



2.0e-58 



Locus Name 



pir :D70386 



ACC# 



D70386 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



&4S.Mu&&„±2„.M I 



WT 



1.6e-li 



Protein name 



Description 



Locus Name 



Acc# 



P35111 



ATP SYNTHASE EPSILON CHAIN, 



NT 



AA 



ORF Name 



NTID 



AAID 



[313 



Length Length 

— 



Score Probability 
3.Se-26 — " 



Protein name 

Description 
HELA PROTEIN 



Locus Name 



sp : HELA LEGPN 



Acc# 
Q48815 



160 



ORF Name 



24875042 ±2 147 



Protein name 



NTID 



rrnr 



NT AA 

— — , Score Probability 
AAID Length Length 



5536 



TH7 



Locus Name 



Acc# 



Description 



IMO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



CM" 



7TT 



Locus Name 



5 . ie-16 



Acc# 



Description 
£>-XYLAN XYLAMOHYMOLASS BJ 



sp:XYNB_BUTFl 



P26223 



ORF Name 



NTID 



NT AA 

— , — ■ Score Pro bability 
AAID Length Length — 



15AlMAl...al...ll$ I \T±Z 



Protein name 



Locus Name 



Acc# 



Description 



IMO-HIT 



ORF Name 



Protein name 



NTID 



ITT 



NT AA ^ , , . - . . 
— — Score Probability 
AAID Length Length ~ 



254 



Locus Name 



'3.7e-0.5 



Acc# 



receptor antigen (RagA) 



Description 



I gp : l >miaOB 7 ^ 



AJ130872 



Porphyromonas gingival is Wbu receptor antigen (rag) locus encodinga major 
immunodominant 55kDa antigen. 



NT 



AA 



ORF Name 



NTID 



AMD 



126831386 tJ, Uy 



Length Length 
Tu"5Ti 



Score Probability 



1.7e-48 



Protein name 



Description 



Locus Name 



sp : PYRL> JiJOoLl 



Acc# 



P05021 



NT 



AA 



ORF Name 



NTID 



AAID 



ITT 



Length Length 



TTTT 



Score Probability 
3.5e-28 



715 



Protein name 



Locus Name 



probable ATt- dependent neiicase 



pir:AVl8Ub 



Acc# 



A71805 



Description 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— , Score 



Probability 
4.8e-ii7 



Protein name 



Description 



Locus Name 



sp:C!2Cb_AL(JtjP 



Acc# 



P94176 



CAT I ON INFLUX ^Tkl M PkOTEIi^ l^cJb 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— , Score 



ITT 



ITT 



Probability 
5.3e-i€> 



Protein name 



Locus Name 



HSK outer membrane protein precursor 
protein 



:SusC 



[pir:J(J6U27 



Acc# 



JC6027 



Description 



162 



ORF Name 



NTID 



AAID 



NT AA 
— — score 
Length Length 



29973itf2 c'2 246 



\TMT 



Probafoility 
I U.4e-105 



Protein name 



Locus Name 



(p)ppGpp syntnetase 



|gp:BSU86377 



Acc# 



U86377 



Description 



Bacillus subtilis (p)ppGpp syntnetase (reiA) and 
adeninephosphoribosyltransferase (apt) genes, complete cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



313lSlO c2 '±b& 



Length Length 
— 



Score 



T5T 



WTT 



Probability 
I6.0e-.S3 



Protein name 



Description 



Locus Name 



sp:YCGO_£CC)Li 



Acc# 



P76007 



PUTATIVE NA{ + j/U( + ) EXCHANGER VCGO 



ORF Name 



NTID 



■AAID 



NT AA 
— — Score 
Length Length 



3.1&5.&0.D....al..3.0.2 1 1327 



5546 



TIT 



ITT" 



Probability 
5-.5e-09 



Protein name 



Locus Name 



gp : AB01587y 



ACC# 



AB015879 



Description 

Porphyromonas gmgivalis dnaK operon genes, complete cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



3.2.6.^.0.D.ll..±3....iS.O... 



325" 



Length Length 
T2*B — 



Score 



415 



Probability 
5..8e-24 



Protein name 

Description 
ATP SYNTHASE A CHAIN, (PROTEIN 6) 



Locus Name 



sp:ATP6_SH0MJ 



Acc# 



P15012 



• 



NT 



AA 



ORF Name 



NTID 



AAID 



31240822 ±2"S2 



Length Length 
ETOS 



Score Probability 



885 



Protein name 
Description 

ffrcrmr 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



\JTT 



5549 



Length Length 



Score Probability 
0.024 



Tin: 



Protein name 



Locus Name 



conserved hypothetical protexn 



bir:G723a5 



Acc# 



G72385 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



ia3Laa22ii...ci...iai i prs 



Length Length 



Score Probability 
2TIu 



b.6e-16 



Protein name 



Locus Name 



diacylglycerol kinase 



gp:B£U29177 



Acc# 



U29177 



Description 



Bacillus subtiiis PhoH (phoH) gene, partial cds, diacylglyceroIJcinase (dgJc) 
gene, complete cds/ and cytidine deaminase (cdd) gene, partial cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



3A17.&38.5...±3....I4.Sl 



TIT 



Length Length 
TFTS 



BTT" 



Score Probability 
— 



|6.ie-264 



Protein name 

Description 
ATP SYNTHA^li! Bk'TA CHAIN, 



Locus Name 



sp:ATPB__fiACPP 



Acc# 



P13356 



164 



NT 



AA 



ORF Name 



NT ID 



AAIP Length Length 



Score Probability 



34181B61 ti iy 



8 . 9e-52 



Protein name 




Locus Name 


Acc# 




1I5K outer membrane protein 




susc 


pir:JC6027 


JC6027 




protein 














Description 












— 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 




3.E6.250.u^r.2^.2 


331 




530 : 


L593 






Protein name 








Locus Name 


Acc# 




Description 














MO-HIT 














ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




m7.:/.b.i..±^..io.i 


332 


■ 5554 


864 


2595 404 


1.3e-36 





Protein name 



Locus Name 



DNA neiicase nomolog 



IgpiA^iOBliB 



Acc# 



AF108138 



Description 



Homo sapiens DJsIA h elicase homolog (PlFl) mKJMA., partial 



:ds . 



ORF Name 



NTID 



~ — Score Probability 
AAID Length Length 



43.3.5.2^ 2^1^20.^. | |37T 

Protein name 



|i.7e-62 



Beta~M-Acetylglucosaminiclase 



Locus Name 
"I |gp;AB&l53bO- 



Acc# 
AB015350 



Description 

Streptomyces thermoviolaceus nagB gene toriieta-N-Acetyigiucosaminiaase , 
complete cds . 



165 



ORF Name 



14454637 ±1 5 



Protein name 



NTID 



JTT~ 



dTDP-glucose 4 - 6 -denyciratase : protein 
slr0809 :protein slr0809 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



715" 



948 



Locus Name 



pir:S75bbO 



6.8e-i07 



Acc# 



S75550 



ORF Name 



Protein name 



Description 



INO-HIT 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



T7F" 



Locus Name 



Acc# 



it w 

u 



ORF Name 



Protein name 



Description 



NTID 



NT AA 

— — Score Probability 
AAID Length Length - 



1380 



2 .5e-25 



Locus Name 



|gp:EC0UW^ 



Acc# 



L10328 



E. coli; the region trom 81. b to 84.5 minutes. 



NT 



AA 



ORF Name 



NTID 



ITT 



AAID Length Length 
5TT 



1644 



Score Probability 
ll.Se-ISfe 



Protein name 

Description 
P&ISMAKIK KiOTUlW ■ 



Locus Name 



spzPRISJMiflVH 



Acc# 



P31101 



166 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



5276557 i'2 y8 



T5T 



1 . 2e-15 



Protein name 



ATP synthase summit b 



Locus Name 
pir:H7^2ii 



Acc# 



H72231 



Description 



ORF Name 



NTID 



AAID 



— — Score Probability 
Length Length 



15561 



F7TT 



i.4e-17 



Protein name 



Locus Name 



FlFO-AT£ase subunit delta 



|gp:AF0y8b22 



Acc# 



AF098522 



Description 



Lactobacillus acidophilus uraci l phosphoribosyltransterase mppjgene, 
partial cds; and FIFO-ATPase operon, complete sequence. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



ITJ7" 



T~~"T 



6 . 4e-U6 



Protein name 



Locus Name 



sp:YF07_METJA' 



Acc# 
Q58902 



Description 
HYPOTHETICAL l>kuTU IIsl MJlbuV 







NT 


AA 


Score 


Probability 


ORF Name 


NTID AAID 


Length 


Length 








fi&aa&M^ai^i^b 


'541 5563 


95 




ZX6 


2.4e-17. 



Protein name 



Locus Name 



RNA-JDindmg protein 



Acc# 



D49425 



Description 

Anabaena variabilis rbpD gene to r kNA-binaxng protein, compxeuecus. 



# 



ORF Name 



NT ID 



AAID 



— — Score Probability 
Length Length 



6046881 fi 140 



5564 



JUT 



WIT 



|3.4e-64 



Protein name 



Locus Name 



3 -methyl - 2 - oxoioutanoate 



gp:CGPAN" 



Acc# 



X96580 



Description 

C.glutamicum panB, pane & xylB genes, 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


68360l5_c2_26b 


343 


5565 

5 


445 


1552 


281 


l.le-21 



Protein name 



Description 



Locus Name 



sp:KfTRY__A^ot'A 



Acc# 



Q04850 



N I TROGEN REGULATION PROTEIN NTM , 



ORF Name 



NT ID 



AAID 



7.D..72.0.3.7....CA..J.Z2... 



5566" 



— — Score Probability 

Length Length 

,240 



TIT 



Protein name 



Locus Name 



Acc# 



Description 



KG -HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
31 1 [253 



Score Probability 



Locus Name 



Acc# 



Description 
IN0-H1T 



168 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score 
11070 



Probability 
13 .6e-I08 



Protein name 



Description 



Locus Name 



[sp:PUk T jj¥N¥i 



Acc# 
Q55336 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



1057762 12 l7b 



3T7" 



Probability 
S.Se-Sl 



Protein name 



Locus Name 



thio-specilxc antioxidant (tsa; peroxidase "J |pir:E72u55~ 

Description 



Acc# 



E72036 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
I7T" 



Score Probability 



^57u" 



fur 



Protein name 



Description 



Locus Name 



Acc# 



POTT 



ORF Name 



NTID 



AAID 



l lS.3.43.R^a^i?.a | 

Protein name 



F57T" 



— — Score Probability 
Length Length — 

0.021 



2W 



75" 



Locus Name 



ATP binding protein 



|gp:BBM'PBP 



Acc# 



X91965 



Description 



B. burgdorferi a£>p gene. 



169 



NT 



AA 



ORF Name 



NTID 



AAID 



ci 191 



TS7T 



Length Length 



Score Probability 



Protein name 

Description 
(NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



TOT" 



AAID Length Length 
— 



Score 



!TuT" 



IT 



Probability 
0.020 



Protein name 



Locus Name 



pE66L 



gp:ASU18466 



Acc# 



U18466 



Description 



African swine tever virus, complete genome . 



NT 



AA 



ORF Name 



NTID 



AAID 



±Z6J.llC).l..±l...iai I 



Length Length 
TTT 



Score 



TTT 



Probability 
. 5e-45 



Protein name 



Locus Name 



nypotneticai protein 



gp : MAAMYG 



Acc# 



X58627 



Description 



A. naioplanktis amy gene tor alpJia- amylase 
1, 4-alpha-D-glucanglucanohydrolase . 



NT 



AA 



ORF Name 



NTID 



iiiflia3Lft...ca...4£fi i itot 



AAID Length Length 
— 



Score 



TTT 



Probability 
B.6e-22 



Protein name 



Locus Name 



single stranded DNA-binding protein 



gp:SSU64095 



Acc# 



U64095 



Description 



^newanella sp . PT99 single stranded DNA-bindmg protein (ssb) gene, complete 
cds . 



170 



• 



NT 



AA 



ORF Name 



1367792 tJ 2^ 



NTID 
T53 



AAID Length Length 



Score Probability 



557F" 



T5T 



2.3e-26 



Protein name 



Description 



Locus Name 



Acc# 



P44118 



ORF Name 



NTID 



AAID 



13650S2 cl 320 



755" 



557T 



— — Score Probability 
Length Length 

1260 



TIT" 



3.fie-7S 



Protein name 



Locus Name 



autoaggregation-meciiating protein 



lgp:At'0yibU2 



Acc# 



AF091502 



Description 



Lactobacillus reuterx autoaggregation 
complete cds . 



-mediating protein ^aggkjgene, 



NT 



AA 



ORF Name 



NTID 



AAID 



557F" 



Length Length 
T7T 



Score Probability 



T5T 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



— — Score Probability 



AA 



ORF Name 



NTID 



AAID Length Length 



75T* 



537T" 



TUT" 



TXT 



Protein name 
Description 



Locus Name 



Acc# 



IN0-H1T 



171 



ORF Name 



|13914&0a t2 181 



Protein name 



Description 
INO-HIT 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



[TuX 



Locus Name 



Acc# 



ORF Name 



Protein name 
Description 



— — Score Probability 
Length Length 

TTT3 



AA 



NTID 



AAID 



Locus Name 



Acc# 



INO-HiT 



ORF Name 



Protein name 



Description 



NTID 



AAID 



NT 

Length Length 



AA 

— , Score 



T£TT 



Locus Name 



Probability 



Acc# 



INO-HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



VST 



T5T 



Score Probability 
i.5e-^ 



Locus Name 



sp:<X 4 ^AjJYA<JA 



Acc# 



P31564 



Description 

CYTOCJHkuME C MiO(JJjMii!£ji£j J^kuT EiM CC^k 



172 



NT 



AA 



ORF Name 



NTID 



14660y^7 ci 494" 



AAID Length Length 



T75TT 



Score Probability 
|7.8e-18 



Protein name 



Description 



Locus Name 
gp:^YDLuh/W 



Acc# 



S.cerevisiae chrom osome IV reaamg trame ukf YDLUb/w. 



NT 



ORF Name 



NTID 



146658SJS c2 4bl 



AAID Length Length 



— Score Probability 



55FET 



1ST 



Protein name 
Description 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



i4S..7.5.iy.i...ci...^.B.y... 



7&T 



^5SS~ 



355" 



TuW 



0.00061' 



Protein name 



hypothetical protein 



Locus Name 
|gp;YEJsJli2y4b 



ACC# 



AJ132945 



Description 



Var-smia enterocolitica WA 314 righ t arm ot the high-pathogeniciuyi3.L*ncr: 



ORF Name 



NTID 



15£±lb^t'^Llb. 



NT — _ , Score s Probability 

|3.Se-17 



AAID Length Length 
[T5T 



Protein name 

ss-DNA binding prot ein l2kN^2 precursor 



Locus Name 



|gp:yYol:akWW 



Acc# 



D17359 



Description 



Synechococcus 6301 gene tor .as-uMA bind ing protein l2HMPa , compie^d*. 



173 



ORF Name 



NTID 



15660937 cl 345 



Protein name 



NT AA 

_ „ — , — . Score Probability 
AAID Length Length JL 



F5FB~ 



hypothetical protein 



Description 



FIT 



3.6e-60 



Locus Name 



pir:T33724 



Acc# 



T33724 



ORF Name 



Protein name 



Mag44 



Description 



NT 



AA 



TS7 



NTID AAID Length Length 

— 



TIT 



Score Probability 




17 .4e-09 



Locus Name 



|gp:DEPMAG44 



Acc# 



D17682 



Dermatophagoid.es tarinae mRNA tor Mag44, partial eels . 



ORF Name 



NTID 



l£A47.aJ.5...ca...5&fl ...J f^F 



Protein name 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



T5T 



Locus Name 



Acc# 



Description 
BtfO-HH 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
57¥" 



Score Probability 



T77T 



Locus Name 



Acc# 



Description 
POTT 



174 



ORF Name 



cl JV1 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



T7TT 



TUTT 



nrrr 



7.8e-10 



Locus Name 



sp : E>RIM__cJLoAb 



Acc# 



P33655 



MA PRIMAL, 



ORF Name 



Protein name 



NTID 



T7T" 



F1T3T 



hypothetical protexn yycJ 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



pir:A700yu 



1.2e-38 



Acc# 



A70090 



ORF Name 



17.0.10.a^U...al^i.b.y.., 



Protein name 



NTID 



T7T 



AAID 



— — Score Probability 
Length Length " 





TTT 



Locus Name 



Acc# 



Description 
[NO-HIT 



ORF Name 



18.0.3.0.2...aA...b.b.i*.. 



Protein name 



NTID 



\T7T 



AAID 



NT 



Length Length 



— score Probability 



73" 



Locus Name 



Acc# 



Description 



NO -HIT" 



175 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


187WO_ai_b0y 


374 


5596 




1080 






Protein name 








Locus 


Name 


Acc# 


Description 
















NO-HIT 




ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 
~ 2.1e-l60 


l^Al^tl^J 


375 


| 5597 


674 


2025 


1563 





Protein name 



Locus Name 



brancnxng enzyme 



] j gp:Alj0^febT0" 



Acc# 



AB026630 



Description 

ricella nxdulans. gene ror b ranching enzyme, complete ccts . 



Erne r i 



NT AA 
— — , Score 



ORF Name 



ia.7.i9.i..±i^zy.i.. 



NTID AAID Length Length 

TITS" 



S59S ' 



Probability 
3.1e-0b 



Protein name 



Locus Name 



ACC# 



P05695 



Description 

PORIN \? PREtTUk^Ok (OUTEk Mfi MBkAMJb! J^kutkim ux) 



ORF Name 



NTID 



NT AA 
^ — Score 

AAID Length Length " 



|19.7.2.9.b.y.l...al^iB.U 



T7T 



T7T" 



FIT" 



Protein name 



Locus Name 



Probability 



Acc# 



Description 



NT 



AA 



ORF Name 
197346*** c2 423 



NTID 



AAID Length Length 



Score 



Probability 
10.021 



Protein name 



Locus Name 



two- component sensor histicune kinase 
ybdK 



Tiomoiog 



pir :F69747 



Acc# 



F69747 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



\TTT 



TTFT 



Probability 
Ii.2e-il8 



Protein name 



Locus Name 



sp : FSRJiltJoLl 



Acc# 



P52067 



Description 

F OSMIIxjMVC I M kErildT AMOl!! PkuTKllsl 



ORF Name 



NTID 



AAID 



NT 

Length Length 



— , Score 



±9&bi&&.±±..a& 



TSTT 



Protein name 



Description 



Locus Name 



Probability 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|4.8e-iti4 



Protein name 



Description 



Locus Name 



Acc# 



P36773 



ATP-D B PEMDENT ^kuTE lA^ LA 1, 



177 



ORF Name 



20344056 t2 1B7 



Protein name 



NT ID 



— — Score Probability 



AAID Length Length 



[SIT 



Locus Name 



Acc# 



Description 



MO-HIT" 



ORF Name 



|2.0.3.5L0.2b.0....c2..AO.b........ 



Protein name 



NT ID 



— M score Probability 



AAID Length Length 



7TT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



T81T 



AAID 



NT 



AA 



Length Length 
552 



Score Probability 



T3T 



Locus Name 



Acc# 



Description 



NO -HIT 



178 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



TS1T 



Score Probability 
0.044 



RT7 



Protein name 



Locus Name 



sp:YokbJlTVl 



Acc# 



P19280 



Description 

HYPOfHtfl'lCMi SJ.b Kb gftOTJjJiJ 



ORF Name 



NTID 



AAID 



NT 

Length Length 



— , Score 



2124506 cl 314 



Probability 
0.00014 — 



Protein name 



Locus Name 



transcription regulator pnage- related 
ydcN 



homo log 



bir:C69774 



Acc# 



C69774 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



3W 



TO" 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



ORF Name 



NTID 



AAID 



5611 



Length Length 
~JJJ 



AA . . 

— score Probability 



TXu~ 



Protein name 



Description 



Locus Name 



Acc# 



W0-H1T 



179 



ORF Name 



NT ID 



— — Score Proba bi lity 
AAID Length Length 



|216640bb ci b8b 



15612 



10.042 



Protein name 



Locus Name 



ATP synthase gamma chain 



Acc# 



AB027877 



Description 



Schizosaccharomyces pombe gene tor atp syntnase gamma chain, partial cds, 
clone:TA2 5. 



ORF Name 



21677180 c^ bbb 



Protein name 



NT 



AA 



NTID AAID Length Length 

TTL 



Score Probability 



Locus Name 



Acc# 



Description 



NO -HIT 



ORF Name 



Protein name 



NTID 



AAID 



estrogen receptor 



Description 



— — Score Probability 
Length Length — 



TUT 



Locus Name 



pir :S2bbyb 



0.03.1 



Acc# 



S26595 



ORF Name 



Protein name 



NTID 



AAID 



T9T" 



hypothetical protein sirubb^ 



Description 



— — Score Probability 
Length Length — - • 



[T7T" 



1125 



Locus Name 



l.Oe-iy 



Acc# 



S77272 



180 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



FT 



Protein name 
Description 

ina^rTT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NT ID 



\220&1±£.2..±±..5.1 1 



AAID Length Length 
— 



TOT 



Score Probability 

m% — 



■/ . ye-4i 



Protein name 



Locus Name 



sp:GLNA_BACCE 



Acc# 



P19064 



Description 

GLUTAMIC SYNTHETASE, (GLOTAMATE- -AMMOWIA L1GASE) 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



22b£ll£.2...cl.,A$& I 



Protein name 

Description 
BsJO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



iimaii..±2...iaa I ptt 



55T5T 



Length Length 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



181 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length " 



22459802 ±2 154 



TUUT 



3.3e-7S 



Protein name 






Locus Name 


Acc# 


■'p-aminobenzoate synthase component I 


homo log 


] pir:P641B7 


F64187 


Description 












ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


22&62Ml^a2^1b. 33 3 




178 


537 


93 


0.042 


Protein name 






Locus Name 


Acc# 








sp:TGN:i_kM' 


P19814 


Description 














"TEEHS-GOLGI NETWORK. INTEUkAL 


MEMBRANE PROTEIN 








ORF Name NTID 


AAID 


NT 
Length 


AA 

Length 


Score 


Probability 


21£5A^±±l..:^b. 400 


S&22 




828 






Protein name 






Locus Name 


Acc# 


Description 












N0-S11' 










i 


ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


^MS.S.aB.^cl^iB.y... 401 


5623 


" 126 


381 


87 


.0.014 | 



Protein name 



Locus Name 



unKnown 



l gp:Afr'U7439b 



ACC# 



AF074396 



Description 



Desultotomaculum tnermocisternum " 
UDP-acetylglucosaminel-carboxyvinyltransf erase (murA) gene, partial cds ; 
yydA, f erredoxin (fdx) , dissimilatory sulfite reductase subunit A 
(dsrA) ,dissimilatory sulfite reductase subunit B (dsrB) , and dsrD 
genes, complete cds; and unknown gene. _ ~~~~ZZZZ^Z 



182 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probab ility 



2347217a 271 



TUT 



5624 



T7TT 



TTTT 



1 . 8e-122 



Protein name 



Locus Name 



Xylose isomerase 



|gp:RVLli24V2 



Acc# 



AJ132472 



Description 

ftuminococcus tlavetaciens xylan utilization operon. 



NT 



AA 



ORF Name 



NTID 



23597202 c3 bl3 



TUT 



AAID Length Length 
— 



lW 



Score Probability 
0.035 " 



Protein name 



Locus Name 



hypotnetical protein F21D9.3 



pir :T2I2Ub 



Acc# 



T21205 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



\116.12ril.±±...bA 



TUT 



TZ5T 



Score Probability 
7 ,6e~60 



Protein name 



Locus Name 



xylulose Kinase 



gpiAi'Ooly'M 



Acc# 



AF001974 



Description 



T hermoanaerobacber ethan olicus putative TrKG gene, partial eels, andputative 
TrkA, xylose isomerase (xylA) and xylulose kinase (xylB) genes, complete cds . 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


116.1&lti£...al...l£& 


405 


5627 




84 255 




6$ 


0.042 



Protein name 

Description 
HYPOTHE T ICAL PkOTE IN 



Locus Name 



sp:Vei3J4UTJA 



Acc# 



Q58610 



183 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
ST7 



Score Probability 



IT 



Protein name 
Description 



Locus Name 



Acc# 



NO -HIT 



ORF Name 



NTID 



AAID 



NT AA 
— — , S core 
Length Length 



TT5TT 



Probability 
i.9e-93 



Protein name 



Description 



Locus Name 



sp:TGT_BAU!JU 



Acc# 



032053 



TRA^^cjLV(JC5VLA^ii!) (GUANINE I NSHk'X'ioN ens yelk j 









NT 


AA 

— , Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 






lllX0&lb..±2..±8A 


40$ 


5630 


355 


10SS §2 




0.013 


Protein name 








Locus Name 




Acc# 



M protein precursor 



] |pir:5j6i081 " 



Description 



ORF Name 



NTID 



23.S.5.5.6.b.l...cl...i27. I 



Protein name 



AAID 



NT AA 
— — , Score 
Length Length — 



T3T 



Locus Name 



Probability 



Acc# 



Description 



NO -HI* 



184 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 
TTT 



Score Probability 



TT 



0.016 



Protein name 



Locus Name 



MesF 



|gp:AF14i44J 



Acc# 



AF143443 



Description 



Leuconostoc mesenteroides plas mrd pH*iU MesC^ (mesci) gene, partiaicas; ana 
tmesentericin B105 (mesB) , MesH (mesH) , and MesF (mesF) genes, complete cds . 



ORF Name 



240271213 c2 4bU 



Protein name 



NTID 



Fsrr 



NT 



AAID Length Length 



AA 

Score Probability 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 




Score Probability 



Locus Name 



Acc# 



NO-HIT" 



ORF Name 



NTID 



AAID 



NT 
Length 



AA 
Length 



— — Score Probability 



WTT 



Protein name 



Locus Name 



Acc# 



Description 
MO -HIT 



185 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length 



124305437 cl bb6 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 




Score Probability 



Locus Name 



Acc# 



Description 



IN0-H1T 



ORF Name 



NTID 



AAID 



NT AA „ _ , , . . 

— — Score Probability 
Length Length 



Protein name 



ITT 



Locus Name 



Acc# 



Description 



[NO-HIT ' 



ORF Name 



NTID 



Protein name 



AAID 



NT AA 

— — Score Probabi lity 
Length Length 



Locus Name 



0.007b 



Acc# 



Description 



P21867 



186 



NT 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



2440bbS7 ±2 ib8 



5bTTT 



75^~ 



T5T" 



|4.2e-0B 



Protein name 

protein antigen bmau'll 



Locus Name 



gp:LMim84b 



Acc# 



U73845 



Description 



Leishmania major prot ein antigen LmfcjTll mRNA, paruiai cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 




TIT 



Score Probability 
0.0029 



Protein name 



Locus Name 



putative repressor protein 



Acc# 
AJ242593 



Description 



Bacteriophage Alia complete genome. 



i I 



Si ■ 

C3 



NT 



ORF Name 



NTID 



AAID 



Length Length 
TU77 



— Score Probability 



Protein name 



Locus Name 



Acc# 



Description 
IN0-H1T 



ORF Name 



NTID 



AAID 



|2i5.0J.2B^...ai...b.ia I F^T 



55TT" 



— — Score Probability 
Length Length 

0.017 



55 



78 



Protein name 



hypothetical protein MJlbb4 



Locus Name 

pir:F64507 



Acc# 



F64507 



Description 



187 



ORF Name 



NT ID 



24633357 cl 354 



TIT 



5644 



Protein name 



hypothetical protexn T27El3.e> 



Description 



NT 



AA 



AAID Length Length 
^WE 



Score Probability 
14 . 5e-62 



Locus Name 



pir :T00b8U 



Acc# 



T00580 



ORF Name 



NT ID 



AAID 



NT AA score Probability 
Length Length 



246Au3.1£...g2..A&0. 



T5T 



Protein name 



Description 



Locus Name 



Acc# 



NO -HIT 



NT 



ORF Name 



NTID 



AAID 



Length Length 



AA 

— Score Probability 



Til 



Protein name 



Description 



Locus Name 



Acc# 



|NO-Mrr 



NT t\t\ 

— — Score Probab ility 

AAID Length Length 

p£T7 



AA 



ORF Name 



NTID 



\2A6As:irL±iJi^b. I 



|1.0e-126 



Protein name 



Locus Name 



putative UDP-giucose dehydrogenase 



gp:AFlb94^B 



Acc# 
AF159428 



Description 



Surkholderia pseudomallei putative UDP-giucose dehydrogenase (udg; , putative 
ADP-heptose synthase (waaE) , and putativeADP-glycero-mannoheptose epimerase 
(gmhD) genes, complete cds . 



188 



ORF Name 



2464^41^ tl 23 



Protein name 



NT ID 



NT 



AA 



AAID Length Length 
FDT 



Score Probability 



TFT 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NT I D 



WIT 



— — Score P robability 
AAID Length Length 



[ST 



Locus Name 



Acc# 



Description 



NO -HIT 



ORF Name 



|2i.7.1D.3.26...±1...2.y.y... 



Protein name 



NT 



NT ID 



AAID Length Length 



AA 

— Score Probability 



T7T 



TZT 



Locus Name 



thiol : disulfide interchange protein nomoiog 
yneN 



Description 



[pTrT^SW 



i.8e-07 



Acc# 



E69891 



— — Score Probability 

NT ID AAID Length Length 

15551 



AA 



ORF Name 



WZ5 



ITWT 



| 2.0e-b^ 



Protein name 



Locus Name 



dTDP-6-deoxy-D-glucose-^, b epimerase 



gp:AJ?'u4874y 



Acc# 
AF048749 



Description 

Sacteroides trag.il is capsu lar polysaccharide biosyntnesis operon, complete 
sequence . .. ■ 



189 



ORF Name 



NT AA 

— , — , Score Probability 
NT ID AAID Length Length JL 



'247^8568 t2 219 



POTT 



1 im — i wn 



Protein name 

Description 
NO- HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID 



Length Length 
71 I H7S 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 

wn — 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 

wji — 



Score Probability 



1 im — I 



Protein name 

Description 
INO-HXT 



Locus Name 



Acc# 



190 



ORF Name 


NT ID 


AAID 




NT AA 
Length Length 


Score 


Probability 




434 


5555 




157 " 1074 


555 




Protein name 








Locus 


Name 


Acc# 










sp:YVAA 


_BACSU 


032223 


Description 














^HYPOTHETICAL OXIDOMlbUcJ'l'At^ 


IN S'HUb- 


roPUBD INTERGJKJn 1C KlfiUlUJN 




ORF Name 


NTID 


AAID 




NT AA 
Length Length 


Score 


Probability 


25S7^O^_c2_404 


435 


5557 




lSl 545 


52 


0.044 



Protein name 



Locus Name 



envelope glycoprotein 



|gp:A^02l7iy 



Acc# 



AF021739 



Description 



"HIV-1 isolate sing clone 4s 
(env) gene, partial cds . 



trom the Netherla nds, envelopegiycoprotein 



ORF Name 



25.i2B.m..±^2.B.b... 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 



tut 



Locus Name 



Acc# 



Description 
NO-HIT " 



ORF Name 



Protein name 



NTID 



AAID 



NT 



Length Length 



AA . -, . 

— Score Probability 



IT5" 



Locus Name 



Acc# 



Description 
MO-HIT 



191 



ORF Name 



NT ID 



AAID 



— — Score Probability 

Length Length 



Protein name 



TTT 



Locus Name 



0.042 



Acc# 



hypotnetical protein yopu 



Description 



|pir: T 1^4^ 



□ 



ORF Name 



NT AA 

— — Score Probability 
NTID AAID Length Length 



Protein name 



Locus Name 



Acc# 



Description 



IN0-H1T 



ORF Name 



— — Score Probability 

NTID AAID Length Length 



Protein name 



70 



Locus Name 



Acc# 



Description 



|N0 -HIT 



ORF Name 



— — Score Probability 
NTID AAID Length Length 



Protein name 



WL 1 



Locus Name 



Acc# 



Description 



zi 



NO-HIT 



ORF Name 



Protein name 



NTID AAID 



NT 



AA 



Length Length 
1 r2TT5 



Score Probability 



Locus Name 



Acc# 



Description 



[NO -HIT 



NT 



AA 



ORF Name 



NT ID 



AAID 



26601510 c2 448 



5665 



Length Length 
73— 



Score 



TUT 



Probability 
|I.7e-05 



Protein name 



Locus Name 



hypothetical protein MJ16 08 



pir:G64500 



Acc# 



G6 45.00 



Description 



NT 



AA 



ORF Name 



NT ID 



AAID 



KIT" 



Length Length 



3TJT 



Score Probability 




y . le-40 



Protein name 



Locus Name 



conserved hypothetical protein aq_JL3 8 6 



pir:P70420 



Acc# 



■F70420 



Description 



£3 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



2.6.6.9.2.3A2...C.Z...&&6. I [¥T^ 



TUT 



ttit 



Protein name 



Locus Name 



succinate- -CoA ligase (ADP- tormmg) , beta 
chain 



Description 



pir:H70439 



1..7e-65 



Acc# 



H7043 9 



y 



ORF Name 



Protein name 

Description 
NO-HIT 



NT 



AA 



NTID 



AAID 



[ITS" 



Length Length 



Score Probability 



TT?T 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— — Score Probability 
Length Length * L 



mflifl5...ca...safl I ft7 



555T 



STT" 



5T5* 



Protein name 

Description 
MO-flIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 
|44& 



AAID 



Length Length 
rm 1122772 — 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



449 



Length Length 
73 



Score Probability 



TTT 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



ORF Name 



NTID 



AAID 



NT AA ' _ , , . _ , . 
— — Score Probability 
Length Length 



2$22±&l...cl..A2± I 



450 



TTS~ 



2.1e-05 



Protein name 

Description 
HYPOTHETICAL PROTE IN HI1602 



Locus Name 



sp:Y<^02_UAii!lN 



Acc# 



P44270 



NT 



AA 



ORF Name 



NTID 



AAID 



451 



Length Length 
7T~ 



Score Probability 



Protein name 

Description 
NO -HIT 



Locus Name 



Acc# 



194 



NT 



AA 



ORF Name 



NT ID 



29412301 tl 7a 



AAID Length Length 



5674 



Score Probability 
7.5e«I4 



TT5 



Protein name 



Locus Name 



sp:LSPA_aTAciA 



Acc# 



Q59835 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2S4700Si_ci_ji76 


455 


5575 


534 


1005 


102 


0.0029 



Protein name 



hypothetical protein PH0283 



Description 



Locus Name 



bir:D714b3 



Acc# 



D71453 



ORF Name 



NTID 



AAID 



NT AA „ „ , , . - . . 
— — Score Prob ability 
Length Length ■ 



3.0.2LB.M0.1...aA...b.2.b.. 



nur 



1.2e-109 



Protein name 



Locus Name 



cytocnrome c peroxidase 



|gp:AV20uib2 



Acc# 



AF200362 



Description 



Haemophilus ducreyi oxaloacetate decarboxylase gamma cnam loadGjgene, 
partial cds; oxaloacetate decarboxylase alpha chain (oadA) , oxaloacetate 
decarboxylase beta chain (oadB) , and alkylphosphonateuptake protein (phna) 
genes, complete cds; ccp gene, completesequence ; cytochrome c peroxidase 
gene, complete cds; and unknowncrene . ; , . — ; — 



NT 



AA 



ORF Name 



NTID 



AAID 



|3jQSjfJj84Jbi...±i...l„ 



Length Length 
TW2 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



195 



NT 



AA 



ORF Name 



NT ID 



AAID 



'31672502 11 24 



Length Length 




Score Probability 
10.00032 



TIF" 



Protein name 

hype I restriction enzyme nsaM: Jtiypotneticai 
protein H91_orf 543 : hypothetical protein 
H91, orf543 „ 



Locus Name 



pxrTT7J0TCT 



Acc# 



S73820 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


32035$67_c3_507 


" 457 


5674 


121 


366 


161 


7.6e-12 



Protein name 



nypotneticai protein 



Locus Name 
|gp:SOTl^9iO 



Acc# 



Y18930 



Description 

Sultolobus soltataricus 28l k b genomic dna rragment, scram . 



NT 



AA 



ORF Name 



NTID 



llieA'./Al^L^.L 



AAID Length Length 



Score Probability 
|4.3e-85 



Protein name 



Locus Name 



succinate- -CoA ligase (ADP-tormmg) , aipna 
chain 



pir:F697l9 



Acc# 



F69719 



Description 



NT 



AA 



ORF Name 



NTID 



nasaaa^ci-Jtoflt | 



AAID Length Length 
T7£T — 



POTT 



Score Probability 
IB .Se-lS 



Protein name 

hypothetical protein TMibtlu 



Locus Name 



hpir:Cj72227 



Acc# 



G72227 



Description 



196 



ORF Name 



34017140 4y8 



Protein name 



NT ID 



AAID 



5682 



— — Score Probability 
Length Length 



m 1 ITTO 



Locus Name 



Acc# 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



|3.40.7.12b.0...±l...b.4.. 



Length Length 



77T 



Score Probability 
1.4e-19 



2T£ 



Protein name 



Locus Name 



sp:YT29_MYCTU 



Acc# 



P71564 



Description 

fOTATlVli CXX£)ORii!DU(J TA5E RV044S, 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



2AH6A6.2...C.2...A1.8... 



472 



TZTT 



7.3e-l4o 



Protein name 



Description 



Locus Name 



Acc# 



sp:tJJcAC 4 __E(50Ll 



IS0MERA&13) 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
771 



Score Probability 



T5T 



Protein name 



Locus Name 



Acc# 



Description 



IN0-M1T 



ORF Name 



NTID 



NT AA 

_ ^ — _ — _ Score Probabi lity 
AAID Length Length 1 ~ 



344™06502 c2 403 



TTET 



1ST" 



Protein name 

Description 
IKIO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



1440.7..7.ai..±i...b.5. I BSS 



Length Length 




Score Probability 
TT2 



7.8e~07 



Protein name 

Description 
HYPOTHETICAL E>R0TE!N MJ0374 



Locus Name 



sp:Y374_METJA 



Acc# 
Q57819 



ORF Name 



NTID 



NT AA 

^-..^ T — ^, T — Score Probability 
AAID Length Length — *~ 



A££l£Sa.7...±l..J.4 1 



3.9e-i27 



Protein name 



Locus Name 



'sp:YHC^6AC 4 SU 



Acc# 



P54608 



Description 

HYPOTHETICAL 60.2 KD PROTEIN IN C^PB-GLPP INTERGENIC REGION 



NT 



AA 



ORF Name 



NTID 



3.6.3.3.7.5.6.5l...c3....5.5.3„ 



^7" 



AAID Length Length 



Score Probability 
0.00020. 



91 



Protein name 



Locus Name 



regulatory protein CsgD 



gp:EC0CURLI2 



Acc# 



AF081826 



Description 

Escnerichia coll csg cluster, partxal sequence . 



198 



ORF Name 



NTID 



— — score Probability 



AAID Length Length 



FT5F 



Protein name 



macroiide-ettiux aetermmant 



Locus Name 
gp:SPOB:i667 



Acc# 



U83667 



Description 

Streptococcus pneumoniae macrol ide-etriux aetermmant imera) gene, complete 



cds . 



ORF Name 



3536838 ci i7i 



Protein name 



NTID 



NT 



AA 



AAID Length Length 
355 



— Score Probability 



TTT 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 
Description 



NTID AAID Length Length 



NT — score Probability 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



Protein name 



NT 



AA 



FT7T 



NTID AAID 
5553 



Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



199 




ORF Name 


NT ID AAID 




NT 
Length 




AA 
Length 


Score 


Probability 


4065760_ti_6i 


472 ■' 5594 




255 


768 


239 


4.1e-20 


Protein name 










Locus Name 


Acc# 


nypotnetical protein 


pir:^7b92b 


S75926 


Description 


ORF Name 


NTID AAID 




NT 
Length 




AA 
Length 


Score 


Probability 


i0..7.26.&:L±l...B.l 


472 5695 




773 


2322 


121 


1.5e-05 


Protein name 










Locus Name 


Acc# 


outer membrane protein 


gp:NGUB±yby 


U81959 


Description 


Neisseria gonorrhoeae outer membrane protein 


(omp8 5) 


gene, compieuecas . i 


ORF Name 


NTID AAID 




NT 
Length 




AA 
Length 


Score 


Probability 


±±5.15.16.^1^16. 


474 5696 






267 


77 


■0.018 


Protein name 


Locus Name 


Acc# 


hypothetical protein ZCJ4 7.1 


pir:T^7b92 


T27592 


Description 


ORF Name 


NTID AAID 




NT 
Length 




AA 

— , Score 
Length 


Probability 


4mtttt!*»±2^17A 


475 5697 


509 


1530 


13/1 


4.6e-140 


Protein name 




Locus Name 


Acc# 


xylose transporter 


gp:AB009b9J 


AB009593 



Description 



Tetragenococcus halophiius r bsC, rbsk, xylR, xylA, xyxu and xymgenes, 
partial and complete cds. 



200 



• 



ORF Name 



434501^ t'A lbb 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



— Score Probability 



Locus Name 



Acc# 



P03020 



ORF Name 



145^288 tl 4b 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
TIT 



Score Probability 



Locus Name 



Acc# 



Description 



IN0-H1T 



ORF Name 



Protein name 



Description 



NTID 



AAID 



15701 



NT 



AA 



Length Length 
7ZU 



— Score Probability 



\TTT 



Locus Name 



Acc# 



NO -HIT 



201 



ORF Name 



14791400 ±2 134 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
£T5 



Score Probability 



7T 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



"NTT AA 

— — Score Probabi lity 
AAID Length Length — 



43..7.:7.1Ab...±A...2.?.9. 



1149 I 13450 



i.7e-2il 



Locus Name 



isoleucine--tkNA ligase, lies : isoieucyi-tKNA 
synthetase : isoleucyl- tRNA synthetase 



Description 



lpir;H7U^0i 



Acc# 



H70203 



ORF Name 



Protein name 



NTID 



I 



NT 



AA 



AAID Length Length 



Score Probability 



crei — I 



Locus Name 



Acc# 



Description 
NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



probable pnospno serine pnosphatase 



Description 



N'T AA 

— — Score Proba bility 
Length Length 



tzht 



Locus Name 



pTrTTT^TTT" 



1.3e-fe4 



Acc# 



T36772 



ORF Name 



NTID 



AAID 



\B.6A011...a^A^l 



Protein name 
Description 



NT AA 

— — ' Score Prob ability 
Length Length 



73" 



2TT 



Locus Name 



Acc# 



INC-H1T 



202 



NT 



AA 



ORF Name 



NTID 



5022037 cl JiV 



AAID Length Length 



F7TT7 



1062 



Score Probability 
|9.6e-76 



TFT 



Protein name 



Locus Name 



Acc# 



sp:YHIM_EC!OLl 



Description 



NT 



AA 



ORF Name 



NTID 



.6347188 c3 !d88 



AAID Length Length 
F£5T2 — 



Score Probability 



T61 



1 . 3e-08 



Protein name 



Description 



Locus Name 



gp:AB00bbbU 



Acc# 



AB008550 



Pseudomonas aeruginosa phage pnx CTX, complete genome sequence. 



NT 



AA 



ORF Name 



NTID 



£8.M8.o.:/...±i...:^ 



AAID Length Length 
"" 



Score Probability 



TIT 



3.1e-0£ 



Protein name 



Locus Name 



probable dnaK suppressor 



|pir:D7i^bb 



Acc# 



D71366 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



8.1.7.8.2..7....al..S.ib.., 



Protein name 



Locus Name 



rRNA methylase homoiog ysgA 



pir :G6yy«4 



Acc# 



G69984 



Description 



203 



NT 



AA 



ORF Name 



NT ID 



AAID 



829436 ci bib 



5711 



Length Length 
\TUT2 — 



Score Probability 
^KS 



|2.3e-58 



Protein name 



Locus Name 



Acc# 



protein Kinase nomoiog Tnx 



|gp:AE07ob20 



AF070520 



Description 

Sinorhizobium meliloti prote in kinase nomoiog Tnx (tni) anaExoF-iiKe 
protein genes, complete cds; and unknown genes. 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


$4637_ci_366 


490 


5712 


68 


207 








Protein name 








Locus 


Name 


Acc# 




Description 
















ill 


NO-HIT 




\ J i 

!(.. : jj 
»l S1S5 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




5.8.&M2L:A..±i....2B.:A ... 


491 


5713 


66 


201 






if! It 

TO 


Protein name 








LOCUS 


Name 


Acc# 




Description 














>!if us;* 


NO-HIT 1 


'- 1' • 

U 

Jj! D!l|, 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


n 


aa£2&a.i..ai...i;te 


492 


S714 


413 


1242 


110 


■3'.3e-14 



Protein name 



Locus Name 



sp : YBUH_E<JoLl 



Acc# 



P75742 



Description 

HYPOTHETICAL b4.2 KB PROTEIN IN kHkH-NEI INTEkijENIC REGION 



204 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


10320312_J:2_49 


493 




32S 


987 






Protein name 








Locus 


Name 


Acc# 


Description 














NO-HIT 1 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


lftfiaim...cA.ab t l 


494 


5716 


103 


312 


|114 


7.3.e-07 



Protein name 

hypothetical protexn AJPElifob 



Locus Name 
pir :H7^6 



Acc# 



H72586 



Description 



ORF Name 


NT ID AAID 


NT 
Length 


AA 

— . Score 
Length 


Probability 
" 1.8e-10 


iaa2B.b.b.:A..±3....ay. 


495 5717 


204 | 


615 149 




Protein name 






Locus Name 


Acc# 


1 conserved hypothetical protein 




| pir:C72361 


C72361 


Description 


ORF Name 


NTID AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


ms.5.8.a:A...c^...iyJ 


55T B71S 


64 


195 




Protein name 






Locus Name 


Acc# 


Description 










NO-HIT 1 


ORF Name 


NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 
l.3e-32 


liUl&:/...±d...au 


497 5719 


■ 316 


951 357 




Protein name 






Locus Name 


Acc# 



|gp:AB0129bb 



AB012956 



Description 

Vibrio cholerae genes tor o-antigen synthesis, strain wu«b , complete cSa. 



205 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



14485841 ±3 yb 



"ZTT 



1.3e-64 



Protein name 



Locus Name 



rubrerythrm 



IgpiAJ^tmib 



Acc# 



AF202316 



Description 



Moorella thermoacetica ru£>rerythrm gene, complete cas . 



NT 



AA 



ORF Name 



NTID 



144S1537 tl 22 



AAID Length Length 



Score Probability 

0.012 



TUT 



Protein name 



Locus Name 



comEA protein-related protein 



pir:l?7230l 



Acc# 



F72301 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



rrzr 



Length Length 
FT77 



Score Probability 



TTO" 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



"5TTT" 



TT3~ 



5T" 



0.0 IT 



Protein name 



Locus Name 



hypothetical protein mvu.i 



bir:T330i^ 



Acc# 



T33032 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



FOX 



Length Length 

s^u — 



Score Probability 



\rnr 



Protein name 



Description 



Locus Name 



Acc# 



IN0-H1T 



206 



ORF Name 



NT ID 



158136 ti 7b 



FOT" 



Protein name 



conserved hypotneticax protein 



Description 



NT 



AAID Length Length 



— Score Probability 



|4.6e-3V 



Locus Name 



] |pir:G724U9" 



Acc# 



G724.09 



ORF Name 



|16.0.b.blbJ....Gd...^b.. 



Protein name 



NTID 



— — Score Probability 

AAID Length Length 



Locus Name 



Acc# 



Description 



I N0-H1T 



ORF Name 



Protein name 



NTID 



FOB 



NT 



AAID Length Length 



— Score Probability 



Locus Name 



Acc# 



Description 



INO-HIT ' 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 
3.5e-ib 



Locus Name 



sp:Y5i6j4Ukl4U 



Acc# 



051468 



Description 

Hy&OTHMl'lcJAL 1'kNA/kkNA MUl'kVL'rkA N^kA^ BHUblb, 



207 



NT 



AA 



ORF Name 



NTID 



AAID 



20S09632 ti 31 



Length Length 
TTT7 — 



Score Probability 



Protein name 



Locus Name 



dihydrol lpoamide 
dehydrogenase, : 2 -oxoglutarate dehydrogenase 
rnmplP.y n ^ in ^ra^f.oin dehydrogenase comply 



pir:I407y4 



Description 



1.2e-S2 



Acc# 



140794 



ORF Name 



20^3143 c'A ISA 



Protein name 



NTID 



TOT 



NT 



AA 



AAID Length Length 
252 



Score Probability 



Locus Name 



Acc# 



Descriptxon 
NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 



1ST 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



2Lil&2Llll...tl...3. 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



Flu" 



5732 



3HT 



3TT 



25T" 



3.4e-2b 



Locus Name 



putative oxiaoreauctase 



gp:SCF7 6 



Acc# 



AL121600 



Description 



Streptomyces coeiicoior cosmia KVb, 



208 



NT 



AA 



ORF Name 



NTID 



AAID 



122575537 ±1 11 



FIT" 



Length Length 



Score Probability 



li.4e-33 



Protein name 



Locus Name 



conserved hypothetical protein ysnA 



pir:C69^a6 



Acc# 



C69986 



Description 



NT 



AA 



ORF Name 



NTID 



ZT2T 



AAID Length Length 



Score Probability 
|8 .3e-05 



127 



Protein name 



Locus Name 



outer membrane protein tolc precursor (toIC) 
RP224 



pir:H7173^ 



Acc# 



H71733 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



5TT 



Length Length 




Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT.. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
T53T — 



Score Probability 
|3.0e-i2S 



Protein name 



Locus Name 



sp:YGFH_EC!OLI 



Acc# 



P52043 



Description 

HYPOTHETICAL KB PROT EI N I N ■ SKM-PBA IWTBRtf ENIC REGION (0492) 



209 



NT 



AA 



ORF Name 



NTID 



AAID 



123986267 £1 19 



15737 



Length Length 
TIF 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



(NO-HIT 



NT 



AA 



ORF Name 



NTID 



\2ll±6.$.&l..±2...5±.. 



AAID Length Length 
1374 



S71F 



WET 



Score Probability 
ia.Se-12 



194 



Protein name 



Locus Name 



chromosomal hemolysin D 



bp:AP0tii284 



Acc# 



AF081284 



Description 



Escherichia coll strain CFT073 chromosomal hemolysin D (hlyD) gene, partial 
cds; and Hpl (hpl), Hp2 (hp2) , Hp3 (hp3) , and Hp4 (hp4) genes, complete cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



\2122$.±C).2..±1...±6.., 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



Z426.Q9.5.2....C.3....2.2.1.. 



SIB 



Length Length 
^ 



Score Probability 



84 



Protein name 



Description 



Locus Name 



Acc# 



(NO-HIT 



210 



NT 



AA 



ORF Name 



NTID 



24409662 c2 166 



AAID Length Length 
TTE 



57TT 



TT7T" 



Score Probability 
TU1 



6 .Se-11 



Protein name 



Locus Name 



iron-uptaKe ractor 



gp:AF051690 



Acc# 
AF051690 



Description 



Pseudomonas aeruginosa iron-uptake tactor (piuC) , 
hydroxamate-typef erris.iderophore receptor (piuA) , and iron-uptake factor 
(piuB) genes, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



24415875 £2 55 



52TT 



Length Length 
TST7 — 



Score Probability 
531 " 



7.2e-6$ 



Protein name 



Locus Name 



arylsulratase 



gp : PAATSASN 



Acc# 



Z48540 



Description 



Pseudomonas aeruginosa atsR, atsB, atsC & atsA genes. 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



244,933.3. .7....C.2...13.Q.. 



52T 



IT5T7" 



I.Oe-SS 



Protein name 

Description 
OUItfOLINATE SYNTHETASE A 



Locus Name 



sp : NADA__3 YNY2 



Acc# 



P74578 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length *~ 



TFT 



Protein name 
Description 

I) (ALPHA-L-FUCOSIDE FUCOHVbkOLASE) 



Locus Name 



sp:FUCOJkAT 



Acc# 



P17164 



211 



ORF Name 



24713351 t2 37 



Protein name 



NT 



AA 



NT ID 



AAID Length Length 




TUT" 



Score Probability 

\rm 



Locus Name 



prollpoprotem diacylglyceryl transferase 
(lgt) RP046 



Description 



pir:F717I2 



l.ie-34 



Acc# 



F71712 



ORF Name 



NT ID 



NT AA ^ ^ , . . n . ^ 
, T ' — ^ — , Score Probability 
AAID Length Length *- 



"5T4~ 



T4T" 



7HT 



TTJe^TT 



Protein name 



Locus Name 



chloramphenicol acetyl transl erase 



|gp:AF124757 



Acc# 



AF124757 



Description 



Zymomonas mobilis tosmid clone 43D2 , complete sequence . 



NT 



AA 



ORF Name 



NT ID 



AAID 



5747 



Length Length 



Score Probability 
FI 



0.0020 



Protein name 



Locus Name 



sp:EREB_ECOLI 



Acc# 



P05789 



Description 
ERYTHROMYCIN ESTERASE TYPE II, 



ORF Name 



NTID 



AAID 



NT AA 
t ™^ t ™+-v, Score. Probability 
Length Length 



[T74~ 



I114S 



2 . Oe-116 



Protein name 

Description 
REGION 



Locus Name 



sp: YVAJMiACSU 



Acc# 



P37518 



212 



ORF Name 



26757637 t3 88 



Protein name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length -L 



^7T 



Locus Name 



hemolysin secretion protexn hlyB : protein 
S111180 :protein sl!1180 



Description 



pir:S75aO& 



1.2e-86 



Acc# 



S75806 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



2..7.4142S...±3....5.S. I 



573TT 



2 .ie-32 



Protein name 
Description 

PU T A T IVE P E N I CILL I N BINDING PROTEIN PkUCUkSok 



Locus Name 



sp : PBPJBAOSU 



Acc# 



P39844 



NT 



AA 



ORF Name 



NTID 



2.22.a3.Q..7,...c3....2.15.. 



535" 



AAID Length Length 
5751 



1584 



Score Probability 
■|2.4e-i20 



1185 



Protein name 



Description 



Locus Name 



Acc# 



sp:NADB_PSEAE 



L- ASPARTATE OX I DASE, (OUlJslOLINA T E SYNTHE T AS E B) 



NT 



AA 



ORF Name 



NTID 



AAID 



3.3.3.S.2I0..7...±l...i3. I VSJU 



5752 



Length Length 



Score Probability 
£T7 



S.le-22 



Protein name 

Description 
HYPOTHETICAL PROTEIN HP0117 



Locus Name 



sp:Y117_HELPY 



Acc# 



P56080 



213 



NT 



AA 



ORF Name 



NT ID 



AAID 



34376679 cl 141 



Length Length 




Score Probability 

rnn — 



l.2e-l64 



Protein name 

Description 
DNA MISMATCH ftBfrAlft pROtEIN MOTS 



Locus Name 



sp : MUTS_HAE IN 



Acc# 
P44834 



NT 



AA 



ORF Name 



NTID 



AAID 



14054128 t'2 bl 



Length Length 



T5W 



Score Probability 
0.026 



T5 



Protein name 



Locus Name 



erytnromycm esterase homo log ybto 



pir : A69750 



ACC# 
A69750 



Description 



ORF Name 



NTID 



NT AA 

^ ^ T — _ — Score Probability 
AAID Length Length i ~ 



|1.2e-1.0i 



Protein name 



Locus Name 



putative protein 



gp :ATAP22 



Acc# 



Z99708 



Description 



Arab i clop sis thaliana DNA chromosome 4, ESSA I AP2 contig tragmentNo. 2. 



NT 



AA 



ORF Name 



NTID 



AAID 



4m&3.7...±3„..&6.., 



Length Length 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



214 



NT 



AA 



ORF Name 



NT ID 



14422752 ±2 52" 



AAID Length Length 




TTT 



Score Probability 
TT2 



Protein name 



Locus Name 



putative giucosyi transterase 



gp:AP10bI16 



Acc# 



AF105116 



Description 



Streptococcus pneumoniae type I9C Cpsl9CR tcpsiycR) gene, partxaiccls; 
putative oligosaccharide repeat unit transporter (cpsl9CJ) , UDP-N-acetyl 
glucosamine- 2 -epimerase (cpsl9CK) , and putativeglucosyl transferase 
(cpsl9CS) genes, complete cds; andglucose-1 -phosphate thymidylyl transferase 
(cps!9CL) gene, partialcds. 



ORF Name 



Protein name 

Description 
MO-HIT 



NT 



AA 



NT ID 



AAID 



L45.4.7.0£3....c2...151... J IHTF 



Length Length 



Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 



Description 



NT 



AA 



NT ID 



AAID Length Length 




Score Probability 
S7 



0.0057 



Locus Name 



|sp:PBP4_HAEIN 



Acc# 
P45161 



ORF Name 



Protein name 



NTID 



probable sulfate transporter 



Description 



NT 



AA 



AAID Length Length 




Score Probability 
1 . 9e-lll 



TTUT 



Locus Name 



pir:A71463 



Acc# 



A71463 



215 



ORF Name 



NT ID 



AAID 



NT AA n _ , , . n . . 

— — Score Pr obabi lity 
Length Length 



5985875 c:4 '220 



^75" 



5.9e-60 



Protein name 



Locus Name 



temchrome-iron receptor 3: protein 
slrl490 iprotein slrl490 



|pir:S744b7 



Acc# 



S74457 



Description 



ORF Name 



NTID 



5TTT 



Protein name 



hypothetical protein PAB1767 



Description 



NT 



AA 



AAID Length Length 
1119 



TTT 



Score Probability 
|3.7e-09 



Locus Name 



|pir:B75i36 



Acc# 



B75136 



ORF Name 



Protein name 



NTID 



5TT" 



NT 



AA 



AAID Length Length 
TI32 — 



Score Probability 



T7T 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



3.7.0^.8.0...±l...la., 



Protein name 



NTID 



"KIT AA 

— — Score Probability 
AAID Length Length ;■ — 



F42~ 



^T5~ 



S.ie-^l7 



Locus Name 



putative leucyl tRNA syntnetase 



gp:AFU6944I 



Acc# 



AF069441 



Description 



Arabidopsis thaiiana teAC TISBiV rrom chromosome iv, near 19 .3 cm, complete 
sequence . 



216 



NT 



AA 



ORF Name 



NT ID AAID Length Length 
"SITS 



T5T 



Score Probability 




!!>.!>e-57 



Protein name 



Locus Name 



putative glycosyl transferase 



gp:AF048749 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccharide biosynthesis operon, complete 
sequence . 



NT 



AA 



ORF Name 



1054637 ±3 214 



NT ID AAID Length Length 




2TT" 



Score Probability 
1033 



3.0e-l04 



Protein name 



Locus Name 



superoxide clxsmutase 



gp:BMRS0D2 



Acc# 



D13756 



Description 



Bacteroides tragilis DNA lor superoxide dismutase, complete cds . 



NT 



AA 



ORF Name 



NT ID 



10.7.5.0.0.S..7....c2...445.„ 



AAID Length Length 
F7^7 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



ORF Name 



NT ID 



NT AA 

— , — , Score Probability 
AAID Length Length — : J ~ 



10M.lBAC)....c,2J19A I |5¥f 



1WT 



FOT" 



7.1e-a0. 



Protein name 



Locus Name 



alpha -D-glucose-1 -phosphate 



gp:YEPASCA 



Acc# 



L27130 



Description 



Yersinia pseudotuberculosis alpha-D-giucose-i~phosphatecytidyiyltransierase 
lascA) gene, complete cds. 



NT 



AA 



ORF Name 



NT ID 



ioajyaav ci 303 



AAID Length Length 
S7^3 



T7T" 



Score Probability 
— 



Protein name 



Locus Name 



CDP-glucose-4 , 6 -dehydratase 



pir :D4 707U 



Acc# 



D47070 



Description 



ORF Name 



NT ID 



NT AA 
_ — _ T — _ Score Probabi lity 
AAID Length Length z - 



Protein name 

Description 
(WO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



U.2L1AQ3lZ...CZ...3L7.7. 



AAID Length Length 
F77T — 



T7T 



Score Probability 
^T7 



1 . 8e-83 



Protein name 



Description 



Locus Name 



sp:ATOCJECOLI 



Acc# 



Q06065 



DECARBOXYLASE INHIBITOR) (ORNITHINE DE C ARBOX YLAS E ANTI2YME) 



NT 



AA 



ORF Name 



NTID 



AAID 



11£3.22&D...±2...&&... I 



Length Length 
TUT 



T2¥" 



Score Probability 
T52 



I.2e-10 



Protein name 

Description 
CBIK PROTEIN 



Locus Name 



sp:OfiIK_SALTY 



Acc# 
Q05592 



218 



ORF Name 



125375 t3 197 



Protein name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length 



1503 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



NTID 



1 



Protein name 



NT 



AA 



AAID Length Length 
TTT 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



13.&3.48.12...cl...i&L. 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



1044 



Locus Name 



Acc# 



o 

38 

£3 



Description 



MO-HIT 



ORF Name 



Protein name 



ThiH 



NTID 



NT AA 

— — Score Probability 
AAID Length Length — — 



14 . 3e-W 



Locus Name 



|gp:AFlB4064 



Acc# 



AF154064 



Description 

Salmonella typhimurium TniH (tnxH) gene, complete cds . 



219 



NT 



AA 



ORF Name 



NTID 



144&y0b0 12 180 



^3" 



AAID Length Length 
F7T7 



1668 



Score Probability 
TT7 



|4.7e-07 



Protein name 



Locus Name 



aspartate ammotransterase 



pir :D75496 



Acc# 



D75496 



Description 



NT 



AA 



ORF Name 



NTID 



i^.^.2.za.7....ti...ia., 



AAID Length Length 
5778 



F7W 



Score Probability 
— 



8 . 6e-194 



Protein name 

Description 
THIAMINE BIOSYNTHESIS PROTEIN THIC 



Locus Name 



Acc# 



|sp:THIC_BACSU 



NT 



AA 



ORF Name 



NTID 



\±±6A110£...±1...211 1 IS37 



AAID Length Length 
TTT9 — 



77T" 



Score Probability 
" 



■0.021 



Protein name 



Locus Name 



conserved hypothetical protein MTH46 9 



pir:D69161 



Acc# 



D69161 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 

m 



0.024 



Protein name 

Description 
£kOTElN K 



Locus Name 



sp:GESK_E<!OLI 



Acc# 



P02988 



220 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



14867327 13 236 



1227 



T7T" 



Li.Oe-iO 



Protein name 



Locus Name 



Acc# 



sp:YIGN_>!<JoLl 



P27850 



Description 

HYPOTHETICAL 54 . 7 KB PROTEIN Itl TOP-UME l&TE&CElNIC MINION gftBeBRSOR 



% 3 



ORF Name 



NT ID 



NT AA n _ 
— — Score Probability 
AAID Length Length 



16064015 c2 585 



TOT 



4^3 



0 . 0060 



Protein name 



Locus Name 



Acc# 



trbA protein 



pir :A49852 



Description 



ORF Name 



NTID 



NT AA 

— , — ■ Score Probability 
AAID Length Length — 



H&4fl43i...ai-..aaa I pur 



5783 



5.0e»47 



Protein name 



Locus Name 



conserved hypothetical protein HP0162 



pir :*3b4b40 



Acc# 



B64540 



Description 



ORF Name 



NT 



AA 



NTID 



AAID Length Length 




7W 



fZTTT 



Score Probability 
| 2.8e~144 



Protein name 

Description 
ATP - bffPENDEMT HfiLIOASE P0RA, 



Locus Name 



Sp : PCRA_BACST 



Acc# 



P56255 



221 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 




TFT 



Score Probability 

m — ~ 



0.0023 



Protein name 



Locus Name 



hypothetical protein MJ1608 



pir:G64500 



Acc# 



G64500 



Description 



NT 



AA 



ORF Name 



NTID 



iA£&afi7.a...Gi...aa7. I ps* 



T — t , T — Score Probability 
AAID Length Length JL 

3735 



5.0e-05 



Protein name 



Locus Name 



unknown 



gp:API44879 



Acc# 



AF144879 



Description 



Leptospira interrogans rrJo locus, complete sequence . 



NT 



AA 



ORF Name 



NTID 



_ _ _ _ — _ — ^, Score Probability 
AAID Length Length JL 

S7F7 



2.3e-130 



Protein name 



Locus Name 



CDP-4 -ice to- 6 -deoxy-D- glucose- 3 - dehydratase 



(gpTYF^T7TT 



Acc# 



AJ251713 



Description 



Yersinia pestis strain EV76 hemH gene (partial J and O-antigen genecluster 
for ddhD gene, ddhA gene, ddhB pseudogene, ddhC gene, prtgene, wbyH gene, 
wzx gene, wbyl pseudogene, wbyJ gene, wzypseudogene, wbyK gene, gmd 
pseudogene, fcl pseudogene, manC gene,wbyL gene, manB gene, wzz gene and gsk 
gene (partial) ... ■ 



NT 



AA 



ORF Name 



NTID 



AAID 



S788 



Length Length 



Score Probability 
T73 



2.1e-34 



Protein name 



Locus Name 



hypothetical protein jhp0094 



pir :E71975 



Acc# 



E71975 



Description 



ORF Name 



NT ID 



AAID 



— — , Score Probability 
Length Length 



TOT 



S7S9 



957 



3 .^e-I46 



Protein name 



Locus Name 



putative UDP-GlcNAc : uncle cap renylphospnate 



gp:AF048744 



Acc# 



AF048749 



Description 



Bacteroiaes rragilis capsular polysaccharide biosynthesis operon, complete 
sequence. 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



20520302 c5 462 



T7W 



6e-0.fi 



Protein name 



Locus Name 



immunoreactive 50kD antigen PG5 3 



gp:AF175720 



Acc# 



AF175720 



Description 



Porphyromonas gingivalis strain W50 immunoreactive 50KD antigenPG53 gene, 
complete cds . 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length - J ~ 



2iasa&fliJi..±i.„i. 



TOT 



9.9e-33 



Protein name 



Locus Name 



terricnrome-iron receptor 3: protein 
slrl490 :protein slrl490 



|pir:S74457 



Acc# 



S74457 



Description 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID 



S7TT 



Length Length 




Score Probability 



3* 



Locus Name 



Acc# 



Description 
MO-HIT " 



NT 



AA 



ORF Name 



NTID 



AAID 



2150505 cl 308 



Length Length 



Score Probability 
l.le-20 



174 



Protein name 



Locus Name 



UDP-glucose-4-epimerase/aTDP-glucose-4 , 6 



gp:AP048749 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccnaride biosynthesis operon, complete 
sequence . 



NT 



AA 



ORF Name 



NTID 



22114755 ti 7 



S7T 



AAID Length Length 
T3T7 



F7M 



T7W 



Score Probability 
T53 



1.8e-35 



Protein name 



Locus Name 



precorrin-6Y methylase : protein 
sll0099 rprotein S110099 



pir:S766$7 



Acc# 



S76697 



Description 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



Protein name 



S7T 



5795 



V5T 



Locus Name 



Acc# 



Description 



ORF Name 



Protein name 



NTID 



571" 



AAID 



NT 



Length Length 



AA 

— Score Probability 



Tin- 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



Protein name 
Description 

NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



aasfiL22fi2...ci...4aft I ftf 



Length Length 



Score 



Probability 
1 .4e-33 



Protein name 



Locus Name 



putative glycosyi transterase 



gp:AP0710tib 



Acc# 



AF071085 



Description 



Enterococcus taecalis strain OG1RF polysaccharide biosyntnetic genecluster, 
partial sequence. 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — JL 



\116±X1&1..±±>..1& I 



fZUT 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



2.3.6.15.2Q5....C.3....52.3.., 



Length Length 
— 



409 



Score Probability 



TFf 



Protein name 

Description 
HEXORINASE TYPU TIT, (HK III) 



Locus Name 



'sp:HXK3_HUMAN 



Acc# 



P52790 



225 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



23617802 r2 IbO 



0.0054 



Protein name 



Locus Name 



orrib 



pir :T417tii! 



Acc# 



T41782 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



5STT 



¥3T 



TT2TT 



i..8e-il 



Protein name 



Locus Name 



conserved hypothetical protein yknkl 



1 |pir:Eb^Bb5" 



Acc# 



E69858 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 




Score Probability 



95 



Protein name 



Locus Name 



Acc# 



Description 



NT 



ORF Name 



NTID 



AAID 



Length Length 



AA 

— Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 
|4.2e-82 



WIT 



Protein name 



Description 



Locus Name 



Acc# 



sp:THMJWOJLl 



226 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
1TD 



JUT 



Score Probability 
0.0011 



35T 



Protein name 



Locus Name 



chaperone GrpE type 2 



gprAl-'O^Jb 



Acc# 



AF098636 



Description 



Kficoiiana tabacum chaperone Grpff type 2 (cjrpE2) mKNA, nuclear geneencodmg 
mitochondrial protein, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



.24023462 cl 3ll 



tz$t 



Score Probability 
2 . 9e~60 



Protein name 



Description 



Locus Name 



sp:YDAR_HA<J^U 



Acc# 



P96593 



HYPOTHETICAL 4b. 7 KB PROTEIN IN MUTT-c^ IB INTlilkc^ENIC kJjOlotl 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 
0.0018 



Protein name 



Locus Name 



unJcnown protein 



gp^CCXVlObK 



Acc# 



X95258 



Description 



S.cerevisiae i0.6Jcop tragment trom chromosome XV. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
1347 



Score Probability 
l.ge-82 



TOT 



Protein name 



Locus Name 



Na+/H+- exchanging protein :Na+/H+ antiporter I foir: JX0360 



Acc# 



JX0360 



Description 



ORF Name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length J - 



1014 



0.0058 



Protexn name 



Description 



Locus Name 



bp^COkHSEX 



Acc# 
L19083 



EscJierichia coli RnsE genetic element; detective RhsE core protein, complete 
cds; complete ORF-E2; H-rpt subelement; complete ORF-H. 



ORF Name 



Protein name 



NTID 



AAID 



^TI- 



NT 



AA 



Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



Description 



NTID 



AAID 



NT AA n ^ , , . _ . ^ 
— ■ — , Score Probability 
Length Length — lL ~ 



WIT 



12484 



TlW 



l.le-120 



Locus Name 



Acc# 



sp:SYPB_ECOLI 



TRNA LIPASE BETA CHAIN) (PHER&) 



NT 



AA 



ORF Name 



NTID 



AAID 



5L4410.7.au..±l.,.7.1..„ 



Length Length 




Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



INO-HIT 



228 



ORF Name 



.24412912 £2 94 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 
573 



Score Probability 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



\2&S.DA1±1.±1..2()A.. 



Protein name 



NTID 



hypothetical protein MTH6 71 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TIT 



Locus Name 



pir :D6.9189 



I.ie-25- 



Acc# 



D6 918 9 



i: :t 
■sr r.;;:' 

ry 

If !!iS ; 



NT 



AA 



ORF Name 



NTID 



24&lM15....r.l...n I 



AAID Length Length 




Score Probability 
1.7e-30 



TT7 



Protein name 



Locus Name 



Acc# 



sp : YLYB_BA(JtJU 



Description 

HYPOTHETICAL 33.7 KB PROT E IN IN LSP-P¥k& INTBfteBNIC ftficilod (ukfc'-X) 



$8% 



ORF Name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length — 



2.5e-61 



Protein name 



Locus Name 



precorrin-3 methylase 



pir :A64497 



ACC# 



A64497 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



24542311 c2 392 



T5T 



2 . le-18 



Protein name 



Locus Name 



unknown 



|gp:Al' l 04y74y 



Acc# 



AF048749 



Description 



Bacteroides rragilis capsular polysaccharide biosynthesis operon, complete 
sequence. 



ORF Name 



2464SS63 cl VA2 



Protein name 



NT 



AA 



NTID AAID Length Length 

un — 



Score Probability 



Locus Name 



Acc# 



in 



Description 



MO-HIT 



ORF Name 



2.46.5.4Qli..±1...3. 



Protein name 



NTID 



cobalamm Joiosyntnesis protein N 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



3981 



Locus Name 



|pir:C!6904& 



l.Se-115 



Acc# 



C69048 



ORF Name 



Protein name 



NTID 



NT AA „ „ , ,. -. . . 

— — , Score Probability 
AAID Length Length 



5821 



hypothetical protein AF04bb 



Description 



T7T 



Locus Name 



bir:H6S306 



3.ie-oy 



Acc# 



H69306 



230 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
33~~ 



Score Probability 



Pro- 



Protein name 



Description 



Locus Name 



Acc# 



MO -HI T 



NT 



AA 



ORF Name 



NTID 



AAID 



\ 1&1L113±.±1J111 1 [SuT 



Length Length 



TZTT 



Score Probability 




JTTe^TT 



Protein name 



Locus Name 



probable membrane protein b084 7 



pir :G64822 



Acc# 



G64822 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



X7W" 



Score Probability 




2.5e-196 



Protein name 
Description 



Locus Name 



sprLEPA^BACSU. 



Acc# 



P37949 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



\25.B.lBmi..±1...20... 



7TT 



1 . ie-49 



Protein name 



Locus Name 



MPT- synthase sulturylase 



|gp:SY£GCM0Sl3 



Acc# 



Y16560 



Description 

Synecnococcus PCC7942 moeB gene. 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



TUT 



ITT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



aamaa2„.ai...aft4 



TUT 



AAID Length Length 



T7TT 



Score Probability 
3 .4e-89 



Protein name 



Description 



Locus Name 



gp:AF025396 



Acc# 



AF025396 



Vibrio anguillarum rtb region, partial sequence. 



NT 



AA 



ORF Name 



NTID 



AAID 



10.5.11$£6..±2...±1Z I FuT 



Length Length 



7WT 



Score Probability 
5.5e-07 



T33 



Protein name 

Description 
R E GULATORY PROTEIN TEN I 



Locus Name 



sp : TEtf i_BACSU 



Acc# 



P25053 



NT 



AA 



ORF Name 



NTID 



3.2I£5.M.5...±l...b. 



AAID Length Length 




TUT 



T3TT 



Score Probability 

lo.ooia 



32 



Protein name 



Locus Name 



hypothetical protein MTH670 



pir :C69189 



ACC# 



C69189 



Description 



232 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
— 



\JUT 



WIT 



Score Probability 
KJM 



1.7e-i42 



Protein name 



Locus Name 



glucose- 1-phosphate thymidyl transterase 



gp : AF048749 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccharide biosynthesis operon, complete 
sequence . 



ORF Name 



NTID 



AAID 



34064010 t3 232 



Protein name 



RNA methyl transterase homo log yet A 



Description 



NT 



AA 



Length Length 



Score 



TOT 



TUT 



Probability 
i.le-65 



Locus Name 



pir :E69793 



Acc# 



E69793 



ru 



ORF Name 



Protein name 



NTID 



WW 



NT AA 
T — — — _ Score Probability 
AAID Length Length ^ 



5832 



FuT 



Locus Name 



Acc# 



Description 



NO -HIT 



lit t!i| 



H 

111 



ORF Name 



Protein name 



NTID 



NT AA 

**-r« t — , -i _ — Score Probability 
AAID Length Length — 



1&11155&...Q1..A&2 1 [£TT 



SWT 



|4.7e-18 



Locus Name 



ADP-L-glycero-D-manno-heptose-6-epimerase 



h?ir:G70330 



Acc# 



G70330 



Description 



233 



NT 



AA 



ORF Name 



NT ID 



AAID 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



\1±15A£3±.±1...2±6 1 



AAID Length Length 



Score Probability 
1.4e-2S 



319 



Protein name 



Description 



Locus Name 



lap: THIERS VNVi 



Acc# 



P72965 



t>VRO&aQd&HORYtAgB) tTM&-&&ASS) (THIAMtNiS-PHOS^HATS SYNTHASE) 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



|6.fie-i3 



Protein name 



Locus Name 



glucosyl transferase 



gp:SMTO2844 



Acc# 



U52844 



Description 



Serratia marcescens putative glycosyltransterase , 
putativeglycosyl transferase, putative heptosyllll transferase 
(waaQ) , 3-deoxy-manno-octulosonic acid transferase (waaA) , 
glucosyltransf erase (waaE) , and KdtB (kdtB) genes, complete cds; and 
Fper(f.pg) gene, partial cds . ; 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — — 



TTT 



0.0002s 



Protein name 



Locus Name 



unknown 



gp:AF0073Sl 



ACC# 



AF007381 



Description 

FiavoJDacterium ^ohnsomae gliding motility protein igicLA) gene, complete 
cds ; and unknown genes . 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



36330078 ci blB 



96 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NTID 



NT AA 
— , — Score 
AAID Length Length 



3.^DAM2..±3....21bu. 



^T7" 



TUT 



T5T 



TIT 



Probability 
2.8e-07 



Protein name 
Description 

Salmonella typhimunum tragment STMFI . 



Locus Name 



gprSTyaTMW. 



Acc# 



AF170176 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



3.5.3.5.S.12...cl...3ib.. 



Score Probability 
|4.0e-281 



Protein name 



Description 



Locus Name 



sp:f>0M^hO^ 



Acc# 



P22983 



DIKlWA^k!) 



ORF Name 



NTID 



NT AA 
— , — , Score 
AAID Length Length 



WTT 



1419 



Probability 
|9.3e-.<>0 



Protein name 



Locus Name 



precorrin-3 metnyiase 



gp:BMAJVb« 



Acc# 



AJ000758 



Description 

Bacillus megaterium i£kb genomic sequence, cojoaiamm operon. 



235 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 
16 .7e-84 



Protein name 



Locus Name 



dTDP~6-deoxy-D-giucose-3, 5 epimerase 



gp:AP048749 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular polysaccharide biosynthesis operon, complete 
sequence. 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — wL - 



40550 tS 144 



T4ir 



wit 



T5T 



3 .2e-36 



Protein name 



Locus Name 



conserved hypothetical protein 



bir:C 4 752S6 



Acc# 



C75256 



Description 



ORF Name 



±±9£M±...cx2...19±.. 



Protein name 



NTID 



WIT 



AAID 



5844 



NT AA 

— . — , Score Probability 
Length Length ™ 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



NTID 



\±ll±±b.b....al„A6:i I IBTJ 



Protein name 



AAID 



NT AA 

— j , , — _ Score Probability 
Length Length — ^ 



1311 



TIT 



Locus Name 



3.5e-19 



Acc# 



Description 
NITRON 1 REGULATION PROTEIN MTkJi, 



spitfTRBjRttOCA 



P09431 



ORF Name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length 



0.0012 



Protein name 



Locus Name 



unknown 



| gp:AF007^81 



Acc# 



AF007381 



Description 



Flavobacterium johnsonxae gliding motility protein igiaA) gene , complete 
cds; and unknown genes. 



ORF Name 



14881512 tl 72 



Protein name 



525 



NT 



AA 



NTID AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



|48.81^.7....a2....4ai.. 



Protein name 



NT 



AA 



NTID 



AAID Length Length 




Score Probability 



Na+./H+- exchanging protein s±l068y : JSJa+/a+ 
antiporter :Na+/H+ antiporter 



Description 



TUT 



Locus Name 



pir :S74414 



0.016 



Acc# 



S74414 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 
TIT" 



Score Probability 

0.0030 — ~ 



TU3 



Locus Name 



growtn-associatea protein 



gp : ZEFGAP 



Description 

Brachydanio rerio growtn-associated protein, complete cds. 



Acc# 



L27645 



237 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



4884635 c3 4bV 



SF5TT 



TUT 



1.7e-37 



Protein name 
unknown 



Locus Name 



|gp:AFi4487y 



Acc# 



AF144879 



Description 



Leptospira interrogans rtb locus, complete sequence. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4&84712_c2_40l 


629 


585l 


254 


765 


636 


3.5e-62 



Protein name 



Locus Name 



exocte oxyr i JDonuc lease 



pir:B6$126 



Acc# 



B69126 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


&$535±l...a3....S.&6 


630 


5852 


193 


582 








Protein name 








Locus 


Name 


Acc# 


Description 
















NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


5.0.47.5.7.^^3.7.3 


631 


5353 


956 


2871 




385 


3.0e-34 



Protein name 



Locus Name 



RcsC 



|gp:A^07l2lb 



Acc# 



AF071215 



Description 

Proteus mirabilis regulato r or swarming behavior precursor irsJoAjana kcs±5 
(rcsB) genes, complete cds; and RcsC (rcsC) gene, partialcds. 



238 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



5110325 c3 521 



TTTT 



1054 



1.8e-106 



Protein name 



Locus Name 



Acc# 



carboxynorspermiame decarboxylase : protein 
S110873 :protein S110873 



bir:a77268 



S77268 



Description 



NT 



AA 



ORF Name 



NTID 



5.112a0.2...ti...20.7.. 



AAID Length Length 
644 



5TT51d 



Score Probability 
|2.8.e-39 



Protein name 



Locus Name 



CbiD protein 



Acc# 



AJ000758 



Description 



Bacillus megaterium i6Jct> genomic sequence, cobalamm operon. 



ORF Name 



Protein name 



NTID 



FIT" 



NT AA „ ^ -i -i • -i ' ±_ 
— — , Score Probabi lity 
AAID Length Length 



5*£T5^~ 



75" 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— — - Score Probab ility 
Length Length 



^T5" 



ITS"5"7~ 



^5TT 



1.4e-58 



Locus Name 



nypotnetical protein 



|pir:S22614 



Acc# 



S22614 



Description 



ORF Name 



7087642 ci 10b 



Protein name 



NTID 



"NTT AA 

— — Score Probability 

AAID Length Length 



IT8W 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



Description 



NT 



NTID 



AAID Length Length 



AA 

— Score Probability 



TOT" 



WIT 



8.8e-S6 



Locus Name 



I spiAMPlj^NVi 



Acc# 



P53579 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



ITT 



1011 



l.le-li 



Locus Name 



conserved hypothetical protein MTH1261 



pir:F6903b 



Acc# 



F6903 5 



Description 



ORF Name 



Protein name 



NTID 



— — Score Probability 
AAID Length Length 



m&s±bb...a±..Akti I fts 



0.014 



Locus Name 



IsptYbJ^llc'OLl 



ACC# 



P75831 



Description 

HYPO T HETICAL ABC TRANSPORTE R AT l>- BINDING PkOTKiN ybjz, 



240 



NT 



AA 



ORF Name 



NTID 



11737^0 C2 81 



AAID Length Length 

mi — 



Score Probability 
TTZe^m, 



Protein name 



Locus Name 



hypothetical protein PH16 7U 



pir : F71U4V 



Acc# 



F71047 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 




Score Probability 



TFT 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



NT 



— — Score Probability 
Length Length ' 





ORF Name 



NTID 



AAID 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



ORF Name 



NTID 



AAID Length Length 



AA 

— Score Probability 



T7F" 



I.le-12 



Protein name 



Locus Name 



serine-ricn protein 



pir :T3yyu^s 



Acc# 



T39903 



Description 



ORF Name 


NTID 


AAID 


NT AA 
— — Score 
Length Length 


Probability 


2LD.7A10.b.3...±^...^b. 


644 


S3 6 6 


240 723 1219 


S.9e-l:i4 


Protein name 






Locus Name 


Acc# 


Sa« 






| gp:AFli62bl " 


AF116251 



Description 

Bacteroides rrag ilis babl operon, complete sequence. 



241 



ORF Name 



209638 t'3 ^ 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 
TW2 



Score Probability 



Locus Name 



Acc# 



INO-HTT 



ORF Name 



Protein name 



Description 



NTID 



NT AA 

— Score 

AAID Length Length 



\2.10l&1..±1J±x I 



1106a 



Probability 
3..7e-67 



Locus Name 



Acc# 



P43764 



(GLYC0£>&0Tk!Afcil±!) 



ORF Name 



Protein name 



BatB 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 
13 .le-iOa 



TuTT" 



Locus Name 



| gp:APil6^bl 



Acc# 



AF116251 



Bacteroides tragilis bati operon, complete sequence. 



NT 



AA 



ORF Name 



NTID 



AAID 



2m.7.1iu...tl...! I 



Length Length 



Score Probability 
|3.6e-7fo 



7^r 



Protein name 



Description 



Locus Name 



sp:FTSY_HAfc!lJsl 



Acc# 



P44870 



CELL DIVISION PROTEIN E"r&¥ 



242 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



23649052 11 V 



Probability 
1 . 2e-ll 



Protein name 



Description 



Locus Name 



sp:Y531_METJA 



Acc# 



Q57951 



HYPOTHETICAL PkoTE IKl MJubil 



NT 



AA. 



ORF Name 



NTID 



23834576 c2 84 



AAID Length Length 




Score Probability 
1.0e-07 



Protein name 



Locus Name 



hypothetical protein AL>l!ly82 



pir:H72500 



Acc# 



H72500 



Description 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— ■ , Score 



1848 



3076 



Probability 
uTT3 



Protein name 



Locus Name 



BatD 



l gp:AFllfe2bl 



Acc# 



AF116251 



Description 



Bacteroides tragilis batl operon, complete sequence . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



M4;0.3.5.3..7....t2....2Ll.. 



S3" 



TTT 



Probability 
i.Se-13 : 



Protein name 



Locus Name 



riJDOSomal protein L2 8 



pir :Kb4lu4 



Acc# 



E64104 



Description 



243 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score 



2441b903 tl 7 



Probability 
|4.0e-141 



Protein name 



Locus Name 



BatE 



|gp:AFll62bl 



Acc# 



AF116251 



Description 

'Sacteroides tragilis bat! operon, complete sequence. 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


24S2356i>J:l_^ 


654 




6§ 


201 


161 


2.£>e-ll 



Protein name 
Description 

CHLOROPLAfl ' i 1 bO^ kiBOaOMAh PkO TUm Li 3 



Locus Name 



Acc# 



P49565 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 
RTSu — 



Score Probability 



1129 



Protein name 



Description 



Locus Name 



Acc# 



MO -HIT 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



1 . 6e-l2 



Protein name 



Locus Name 



antigen 332 



ACC# 



JN0292 



Description 



244 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



314306^ c± 40 



12544 



TT7TT 



Protein name 



Locus Name 



DNA gyrase A summit 



gp:AB017712 



Acc# 
AB017712 



Description 



feacberoid.es rragilis gyrA gene lor MA gyrase A summit, completecas, 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



33357127 tl b 



335" 



2 . 9e-170 



Protein name 



Locus Name 



EsatA 



|gp:Affil62bl 



Acc# 



AF116251 



Description 



Bacteroides tragilis batl operon, complete sequence, 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
F7T3 



Score Probability 
|6.7e-36 — ™ 



Protein name 



Locus Name 



conserved Hypothetical protein BBOi/b 



bir:G70l2l 



Acc# 



G70121 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1.3e-y4 



Protein name 



Locus Name 



conserved hypothetical protein aq_84y 



jpir:E7W7i 



Acc# 



E70373 



Description 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



34564376 if 1 



MI 



i.le-58 



Protein name 



Locus Name 



hypotnetical protein 



bir:a7bbbl 



Acc# 



S76561 



Description 



ORF Name 



NT ID 



— — Score Probability 



AAID Length Length 



3.9.m25...±i...M.. 



ITT 



1005 



8.8e-82 



Protein name 



probable moxR protein 



Locus Name 



Acc# 



B70874 



Description 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



45.6.153..7....CA...B.:/. 



\ZZT 



\ZTT 



TZTT 



TP" 



6.3e-0$ 



Protein name 



Locus Name 



conserved hypothetical protein aq_«b4 



] |pir:B-yoj74 " 



Acc# 



B70374 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



5.3.45.6.1:/....!^^.. 



TUT 



|4.2e-ll 



Protein name 

Description 
DNA-51Jslb I NG ykoTKlN HU 



Locus Name 



sp : DBU_Tkli!MA 



Acc# 



P36206 



246 



ORF Name 



7072675 ±4 36 



Protein name 



BatB 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



Locus Name 



gp:AF1162bl 



1.4e~67 



Acc# 



AF116251 



Sacteroides tragiiis bat I operon, complete sequence. 



ORF Name 



10562517 t3 VB 



Protein name 



NTID 



NT 



AA 



AAID Length Length 




Score Probability 



7T~ 



Locus Name 



Acc# 



Description 



ttjfO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



iipase-UKe protein 



Description 



— — Score Probability 
Length Length 



T53T - 



TIT" 



Locus Name 



pir :A647Ub 



2.6e~26 



Acc# 



A64706 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
7UI 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



hypotnetical protein bH0b-3U 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TIT 



Locus Name 



pir:A70l66 



2.7e-16 



Acc# 



A70166 



247 



ORF Name 



17086686 t3 9H 



Protein name 



Description 



NT 



AA 



NTID 
pi? 



AAID 



15892 



Length Length 



Score Probability 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



2flSftfiDl6A..±l...lSA-. 



Protein name 



NTID 



AAID 



hypotnetical protein ]npl3 80 



Description 



NT 



AA 



Length Length 
WTT 



Score Probability 
i.8e-i6 



Locus Name 



Ipir:<57i8ib 



Acc# 



G71815 



ORF Name 



NTID 



NT AA „ _ , , . , . . 
— — Score Probability 
AAID Length Length 



WTT 



KIT 



THT" 



TUT" 



2 . 7e-05 



Protein name 



Locus Name 



cytochrome Jo 



|gp:AF017bi6 



Acc# 



AF017516 



Description 



Sombus pascuorum cytochrome x> (cytb) gene, mitochondrial gene en coding 
mitochondrial protein, partial cds . 



NT 



AA 



ORF Name 



NTID 



WTT 



AAID Length Length 




Score Probability 
|1.8e-i06 



Protein name 



Description 



Locus Name 



sp:Cfili>_SALTV 



Acc# 



Q05597 



COBYRIC ACID. aVMTHAtt B 



248 




NT 

ORF Name NT ID AAID Length 


AA 
Length 


Score 


Probability 


|242691B0 r2 VI m SS^T" 400 1203 


263 


" 2.9e-21 


Protein name 


Locus Name 


Acc# 


"hypothetical protein ;jnpi3 7S> j 


pir:F7lBlb 


| F71815 


Description 


npF ™*™<* NT ID AAID Length 


AA 
Length 


Score 


Probability 


zmnnnzEJL...s3L b'/s sstj 321 966 




5.5e-4i 


Protein name 


Locus Name 


Acc# 


^TIcotinate-nucleotide--dimetnyiDenzimiaazoie 


pir:A7bb77 


A75577 


phosphoribosyltransf erase 








Description 








ORF Name NTID AAID Length 


AA 
Length 


Score 


Probability 


2S3aai£.7...±l»A 616 58 9 8 206 621 


310 


1.2e-27 


Protein name 


Locus Name 


ACC# 


cobinamide kinase / coPmamiae pnospnate 


pir :S52220 


guanylyl transferase 








Description 








ORF Name NTID AAID Length 


AA 
Length 


Score 


Probability 


2Ml?.m...cd...l^ *TT ^ ^2 1509 


122a 


5.7e-I2b 


Protein name 


Locus Name 


ACC# 


proline- -tklsIA iigase, pros : prolyl -tKNA 


pir :A70lbO 


A70150 


synthetase: prolyl »tRNA synthetase 





Description 



249 



• 



NT 



AA 



ORF Name 



NTID 



'24641^0J c2 166 



AAID Length Length 



Score Probability 



57JT" 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



TUUT 



Score Probability 



Protein name 



Locus Name 



immunoreactxve 36 KDa antigen PG14 



gp:AF14 5'/yB 



Acc# 



AF145798 



Description 



Porphyromonas gingivalis strain W50 immunoreactive 36 KDa antigenPGl4 gene, 
complete cds . 



NT 



AA 



ORF Name 



NTID 



2±&22h&&.±2..&6. J 



AAID Length Length 



ATT 



Score Probability 
7,3e-o7 



Protein name 



Locus Name 



hypotnetical protein 



pir :S7&776 



ACC# 



S76776 



Description 



ORF Name 



25401437 ci lib 



Protein name 



Description 



NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




5904 


165 


498 




178 


2 . oe-u 








Locus 


Name 




Acc# 



IsptYJJPJiAUlN 



P44520 



HYPOTHETICAL ^j^oTEX N H101O8 



NT 



AA 



ORF Name 



NTID 



AAID 



c2 14b 



Length Length 




TTT 



Score Probability 
5.3e-20 



238 



Protein name 



Locus Name 



sp:YJJE»„ElcJObl 



Acc# 



P39402 



Description 

HYPOTHETICAL 30.b KB PkOTUlN IN DJsJAT- 



BGLJ INTERS E NIO kk!G I0N (Fa 77) 



NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


584 


5 £06 


580 


1743 


1106 


5 . be-112 



ORF Name 



3.15.5..7.0.a.0....a2...144 



Protein name 



Locus Name 



Acc# 



spiYID^kJOuLl 



Description 
HYPOTHETICAL b8.9 K b PROTEIN IN (iLA/cJ 



-1SPB INTHP^ENIO RHci lON (okfc'A) 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



m2&4!x2...aA...m i 



1.2e-24 



Protein name 



Locus Name 



conserved Hypothetical protein yvqK 



h?ir:D70046 



Acc# 



D70046 



Description 



ORF Name 



NT ID 



— — Score Probability 



AAID Length Length 



1WT 



3 . 7e-ll 



Protein name 



Locus Name 



probable phosphogiycerate mutase 



Description 



|pir:B7bbJy 



Acc# 



B75539 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


Mi3.5.aay....c3....iy.u 


6§7 5905 


32b 


978 


4S9 


i.3e-46 



Protein name 



Description 



Locus Name 



Acc# 



P21634 



COBD PkoTEllsl 



ORF Name 
42212b...±l...lfo... 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



TTF 



0.0025 



Locus Name 



Joeta- tropomyosin 



bxr:a23470 



Acc# 



S234 7 0 



Description 



ORF Name 



|4S.S.9.D.2..7....ci...ly.b... 



Protein name 



NTID 



AAID 



tricorn protease 



Description 



NT 



AA 



Length Length 
— 



Score Probability 
ii.3e-^y — 



Locus Name 



|gp:TAU72^bO 



Acc# 



U72850 



Thermopiasma acidophilum (j'f P-binding protein ana tricorn protease it-kI) 
genes, complete cds . 



252 



ORF Name NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


48u7062_cl_I15 590 




448 1347 




2.1e-54 


Protein name 


Locus Name 


Acc# 


cobyrmic acid a, c-diamiae 


synthase 




pir :A7b6l9 


A75619 


Description 


ORF Name NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4M£5.1...c^...U9.. 591 


1 5313 


821 2466 


435 


1.3e-37 


Protein name 


Locus Name 


Acc# 


two component sensor 


gp:AF0303b2 


AF030352 


Description 


Pseuclomonas aeruginosa two 


component 


sensor (lemA) gene, partiaicas. | 


ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


£116.S.ii6....cl...Ul 592 


5914 


289 870 


296 


3.8e-26 


Protein name 


Locus Name 


Acc# 


CobD 


gp:3TU9052b 


U90625 



Description 



Salmonella typhi murium alpha-ribazole- b 1 -phosphate phospatase cose icodu) 
gene, partial cds and putative aminotransferase CobD (cobD)gene, complete 
cds . 



NT 



AA 



ORF Name 



NTID 



5.mb.M..±l...M.. 



AAID Length Length 
3T2 



Score Probability 



5^TF 



TUT 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
753 



BSD" 



Score Probability 
3 . 5e-23 



Protein name 



Locus Name 



cobalamm synthase 



toir:H75576 



Acc# 



H75576 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
3TJI 



Score Probability 



53T7 



Protein name 



Description 



Locus Name 



Acc# 



BTO-Hrr 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



F3TF 



Score Probability 
4 . 3e-09 



Protein name 



Locus Name 



hypothetical protein 



bp:SSU18930 



Acc# 



Y18930 



Description 



Sultoiobus soltataricus 281 KJD genomic DNA tragment, strain P2 , 



NT 



AA 



ORF Name 



537 



NTID AAID Length Length 
S3T3 



Score Probability 



TTJT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



254 



ORF Name 



(1199075 C2 'Abl 



Protein name 



NTID 



NT 



AA 



AAID Length Length 
S3 - 



Score Probability 



T55" 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



Description 



NTID 



AAID 



vrrp 7\ 7\ 

— — Sc ore Probability 
Length Length 



[OT7T 



Ii.7e-i08 



Locus Name 



Acc# 



ORF Name 



Protein name 



NTID 



YTTTT 



AAID 



NT 



Length Length 



— score Probability 



Locus Name 



Acc# 



Description 
NO-HIT 



ORF Name 



Protein name 



NTID 



7TTT" 



AAID 



— — Score Probability 

Length Length — 

STT7 



TO 



Locus Name 



Acc# 



Description 



NO-HIT 



255 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



137862S1 ti 



1 . 8e-33 



Protein name 



Locus Name 



histiciine Kinase 



gp:AF114442 



Acc# 



AF114442 



Description 

Nostoc punctirorm e histidme Kinase tnepK) gene, complete cqs . 



— — Score Probability 
AAID Length Length — 

— 



AA 



ORF Name 



NTID 



114636063. c3 3bb 



[7TTT 



ITTTTT 



P5W 



|4.9e-8S 



Protein name 



Description 



Locus Name 



|sp:HI^7_UAhliN 



Acc# 



P44327 



NT 



AA 



ORF Name 



NTID 



AAID 



7M" 



Length Length 

mn — 



Score Probability 
|i.ie-282 



T7TT 



Protein name 



Locus Name 



Bi 2 -dependent 



gp:EC0UW8y 



Acc# 



U00006 



Description 

E. coli chromosomal region rrom 89.2 to 92 . a minutes. 



ORF Name 


NTID 


AAID 


NT AA 

— Score 
Length Length 


Probability 




705 


SS27 


7515 — 2367 1213 




^ . ye-i^y 


Protein name 






Locus Name 




Acc# 












P52155 


Description 













256 



ORF Name 


• 

NT ID 


AAID 


NT 
Length 


• 

AA 

— , Score 
Length 




obabil ity 


1562964ii_t2_79 


~ 7 06 


5928 


380 1143 113 




0.0057 


Protein name 










Locus Name 




Acc# 










gp:PFMAL3P2 






Description 
















Plasmodium raiciparum MAL3P2, 


^, W 1 1 \f~s Jl. \Z ^ 


sequence 








ORF Name 


NT ID 


AAID 


NT 
Length 


AA 

— - , Score 
Length 


Probability 


±§S54S2_c3_340 


707 


5929 


133 402 88 




U.UUZJ 


Protein name 










Locus Name 




Acc# 










gp : SYcPURT 




L36958 


Description 


















Synechocystis sp. {cione pSYN411) glycinamide riJDonucxeotJ.aeLidiiyi.oj.iuyj.dfae 
(purT) , Orfl34 and dnaA genes, complete cds , photosystem II reaction center 
protein D2 (psbD) gene, 5' end. 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


2.ullM£.2..±l...lb^ 


708 


5930 


642 1929 iue>y 






Protein name 










Locus Name 




ACC# 


hypothetical protein RV2438C 




pir:D70680 


D70680 


Description 
















ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




709 


5931 


289 870 101 






Protein name 










Locus Name 




Acc# 


hypothetical protein 




gp:AF021091 




AF021091 



Description 



Helicobacter pylori hypoth etical protein (M>Q3 9b) , nypotneticaiprotem 
(HP0394), chemotaxis protein CheV (cheV) , bifunctionalchemotaxis protein 
CheF (cheF) , chemotaxis protein CheW (cheW) , andadhesin-thiol peroxidase 
TagD (tagD) genes, complete cds; andsuperoxide dismutase SodB (sodB) gene, 
partial cds . ; — 



257 



NT 



AA 



ORF Name 



NTID 



2161286 t3 IbU 



7TTT 



AAID Length Length 




Score Probability 



Protein name 



Locus Name 



Acc# 



Description 
MO-HIT 



NT 



ORF Name 



NTID 



AAID 



2±6A±b.b:A....a2J21± 



7TT 



Length Length 



7\ 7V 

— Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— • , Score 



Probability 



TIT 



2031 



l . 5e-i3 



Protein name 



Description 



Locus Name 



sp : PLEC_<JAU(Jk 



Acc# 



P37894 



NON-MOTILE AMD PHAcJE-kiil^l^TM CE PkuTiillN, 



ORF Name 



NTID 



— — Score Probability 



22$A22bJL..^..J±3... 



HIT 



AAID Length Length 



Protein name 



Description 



Locus Name 



Acc# 



[NO -HIT" 



258 



ORF Name 



NTID 



NT AA 

, „ — , — _ Score Probability 
AAID Length Length ~ JL 



123477187 tl ±4 



TIT" 



ITT 



b.ye-37 



Protein name 



Locus Name 



pir :I40328 



ACC# 



140328 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
TT1 



Score Probability 



Protein name 
Description 

ino-hit — : 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



n5ai7.5ifli..±i.M.. 



7TT 



5938 



Length Length 



Score Probability 



2151 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



7TT 



Length Length 
3T7~ 



Score Probability 
Wl 



Protein name 



Locus Name 



hypothetical protein PH0161 



pir:G71237 



Acc# 



G71237 



Description 



ORF Name 



2A25.242L7....C.3....3.8.1.. 



Protein name 

Description 
INO-HI T 



NTID 



AAID 



7T3 - 



NT 



AA 



Length Length 



Score Probability 



7W 



Locus Name 



Acc# 



259 



ORF Name 



NTID 



NT AA 

— — Score Prob abi l ity 
AAID Length Length 



7TT 



15941 



TT5T 



Protein name 



Locus Name 



conserved hypothetical protein MTH8B4 



Description 



[pir:BSy2I^ 



8.0e-12 



Acc# 



B69218 



ORF Name 



2±A0M±l...al..M±... 



Protein name 



NTID 



AAID 



TTT 



TMT 



NT AA 

— — Score Probability 
Length Length 



5T - 



TTT 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



TIT 



NT 



AA 



Length Length 
290 



Score Probability 



VTT 



Locus Name 



Acc# 



Description 



MO -HIT 



ORF Name 



2l6Am&a...a>±...2±L. 



Protein name 



NTID 



AAID 



TIT 



5944 



NT 



AA 



Length Length 
T35" 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



TTT 



5945 



NT 



AA 



Length Length 



Score Probability 



TUZT 



Locus Name 



ACC# 



Description 



NO-HIT 



260 



ORF Name 



2464Sb^ cl 'ASS 



Protein name 



Description 



NT ID 



TIF 



NT 



AAID Length Length 



AA 

— Score Probability 



TOT 



7.4e-30 



Locus Name 



sp:Y746J4ii)TJA 



Acc# 
Q58156 



HYPOTHETICAL PkoTlillN Ma 0746 



ORF Name 



124650502 rl lb 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



TIT 



12172 



Locus Name 



Acc# 



Description 



ORF Name 



NT 



NTID 



AAID Length Length 



AA 

— Score Probability 



2.46.S0.y.lZ...c2....1i^ I 



l^TST 



Protein name 



Locus Name 

sodium/ proline symporter (proline permease) "J [pir:C6b>ll5 
Description 



1 . Oe-144 



Acc# 



C69115 



ORF Name 



Protein name 



NTID 



247.9.8Ab.7...±3....16.2. 



7TT 



— — Score Probability 



AAID Length Length 



SUIT 



Locus Name 



Acc# 



Description 



NO-HIT 



261 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




T48065S7_cl_221 


728 


5950 


570 2013 




233 


1.2e-21 




Protein name 








Locus 


Name 


Acc# 












sp: DSHD 


JHAE1N 


P44919 




Description 


















^TOGENESTS SftOTillN CYGZ) 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




24853385_c2_:m 


723 




343 










Protein name 








Locus 


Name 


Acc# 




Description 


















MO-HIT 


















ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




is.oaib:.l±z...±&& 


730 




258 777 




8$ 


0.0053 




Protein name 




Locus 


Name 


Acc# 




ORF128 hypothetical 


protein 






gp:AF008210 


AF008210 




Description 




1 Buchnera aphidicola genomic tragment containing (chaperone HspbU>groEL, jjjna 
biosynthesis initiating protein (dnaA) , ATP operon (atpCDGAHFEB) , and 
putative chromosome replication protein (gidA) genes, complete cds; and 
termination factor Rho (rho) gene, partialcds. 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




2S£S.9A2^al^llA 




5953 


64 19b 




$2 


0.0001b 




Protein name 


Locus 


Name 


Acc# 




hypothetical protein ssrl765 


pir:S74779 


S74779 





Description 



262 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


26220277_cl_2b2 


732 


5954 


193 


582 






Protein name 








Locus 


Name 


Acc# 


Description 














MO-MIT 1 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2fi3Lfi.7.Mi..±2...1^UL 


733 


5955 


341 


1026 






Protein name 








LOCUS 


Name 


Acc# 


Description 














NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

Length 


Score 


Probability 


i6:iL^om...ai..:^ 


734 


5956 


| 116 


351 


263 


1 . 2e-22 



Protein name 



Locus Name 



|sp: YUAlJJcJuLl 



Acc# 



P42622 



Description 

HYPOTHliTlUAL U.b Kb MOTE IN 1*1 EXUk-TbUiJ IN TEkc^lU kl^luH 



— , , — ^ Score Probability 

|4.6e-b7 " 



ORF Name 



NTID 



AAID 



26.4.616.11^1..:^.. 



7T5" 



Length Length 
T5T 



Protein name 



Description 



Locus Name 
|sp:HISa_c!AWMA 



ACC# 



P56099 



PHOSPHATE TRANSAMINASE) 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


292S387_cl__2ib 


736 


595^ 


197 


594 






Protein name 








Locus 


Name 


Acc# 


Description 














MO -HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 
6 . le-145 


19:±±±0.&D....a±Jl±± 


|73 7 


5959 


724 


2175 


1417 





Protein name 



Locus Name 



Acc# 



spiDcJ^JiicJoLl 



Description 

PSmDVL-uIPflPi'lbA^ OOP, (blP^TIDVL L'AkB OXY^P'i'ibA^) 



ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 
2.7e-55 


10.1$.6.^.±1...A& 73 8 


5960 




758 571 




Protein name 






Locus Name 


Acc# 


I uridine Kinase uclk 






] pir:G69728 


G69728 



Description 



ORF Name 



NTID 



2.2.22S.40.B...±3....17.6. 



Protein name 



unknown 



Description 



— — Score Probability 



AAID Length Length 



|139b 



T7T 



Locus Name 



|gp:A?086faib 



Acc# 



AF086638 



Pseudomonas putida CumA precurso r IcumA) ana cum* icumbj genes , compiece 



cds; and unknown genes. 



264 



ORF Name 



33235905 c3 38b 



Protein name 



NT ID 



AAID 



NT 



AA 



— — , Score Probability 
Length Length ^ 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



NT ID 



3.41S.M3.B....tl...3.5 I 



7TT 



Protein name 



NT AA 

— , — , Score Probability 
AAID Length Length 



T5T 



72TT 



¥TCT 



Locus Name 



2.4e-40 



Acc# 



sp:VHHW_ECOLl 



P46852 



Description 

HYPOTHETICAL 2£.3 K& PROTEIN IN <attffe-(j( 3T INTEkGffifIC REGION (F23l) 



ORF Name 



NT ID 



NT AA 

— ^ -r — " S core Probability 
AAID Length Length — - 



742 



493 



.1482 



i.£e-34 



Protein name 



Locus Name 



damage- inducible protein PAB0243 



pir :A75151 



Acc# 



A75151 



Description 



ORF Name 



NT ID 



3.£.D.5.6.b.lu...al...2b.l I UV5 



Protein name 



NT 



AA 



AAID Length Length 
5965 



Score Probability 



Locus Name 



Acc# 



Description 



WO -HI* 



ORF Name 



16±12$±2..a2J2&'L. 



Protein name 



NT ID 



AAID 



NT AA 

— ^ — , Score Probability 
Length Length 



5966 



1ST" 



5TJ¥" 



0.00018 



Locus Name 



hypothetical protein SCAE^.OB sC:2E9.08 



pir :T34819 



ACC# 



T34819 



Description 



1# 



NT 



AA 



ORF Name 



NTID 




AAID Length Length 



Score Probability 



5¥T 



1017 



ITT 



Protein name 



Locus Name 



Acc# 



hypothetical protein F19D11 . lb : nypotneticai 
protein F14M4 . 29 :hypothetical protein F14M4.29 



Description 







NT 


AA 

— , Score 


Probability 


ORF Name 


NTID AAID 


Length 


Length 


3.S.2.S.5.S0...±i...lb.l 


- 745 5968 


1054 


3165 325 


4.ie-45 



Protein name 



Locus Name 



HSK outer membrane protein precursor : yusu 
protein 



pir:JcJ6027 



Acc# 



JC6027 



Description 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


laimi^ti^ii:/ 


747 5565 


416 


1251 


1561 




Protein name 






Locus Name 


Acc# 








sp : ClHJfe_HAcJ'rM 


Q02550 


Description 














CHONDRO - 6 - SULPATA^E 


REGULATORY PkOTUlW 












ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


3.MB.b.6.Z...cl...217. 


748 5570 


204 


615 


373 


2.6e-34 


Protein name 






Locus Name 


Acc# 








spiYl^O^METTM 


026223 


Description 















266 



ORF Name NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4022312_cl_209 749 


5971 


2S2 t 


349 


141 


2.7e-09 


Protein name 


Locus 


Name 


r-i.^- ^ tr 


terredoxm [tax~'3) nomoiog 


pir:C69294 


C69294 


Description 












ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


40£3.m...cl..3.ii:/. 750 


5972 


301 


90S 


127 


i . ye - 14 


Protein name 


Locus 


Name 


Acc# 


leader peptidase Lep 


gp:AF188620 


AF188620 


Description 












Bordetella pertussis iep operon, complete sequence. 




1 


ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


41l7.IttU..±A...ia:/. 7 51 


5973 


426 


127& 


1151 


9.4e-117 


Protein name 






Locus 


Name 


Acc# 








sp:SRb4 




P37105 


Description 














"SIGNAL RECOGNITION Mk'i'icJLE 


PROTEIN 


[TTFTY- FOUR HOMULUUJ 






ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


TA6±S£2^1^1A 


5974 


216 


651 


216 


l.le-17 


Protein name 






Locus Name 


Acc# 


hypotheticai protein FABr/bi 




D75137 



Description 



NT 



AA 



ORF Name 



NT ID 



75T 



AAID Length Length 




T5T 



Score Probability 
i.2e-20 



Protein name 



Locus Name 



temc uptaKe regulator homolog 



gp:AP0Sbbyfe 



Acc# 



AF095596 



Description 



Staphylococcus au reus strain 1SP3 temc uptaKe regulator nomoioglturB) 
gene, complete cds. 



ORF Name 



Protein name 



synthase III 



Description 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



pir:F70394 



1.4e-67 



Acc# 



F70394 



ORF Name 



Protein name 



Description 



NTID 



AAID 



NT AA 

— — Score Pro bability 
Length Length - 



PUTT 



1 . le-62 



Locus Name 



sp:HISI_iiALTV 



Acc# 



P00499 



ATP PHOSPHORIBOayLTRANSPEkAiiJil, 



NT 



AA 



ORF Name 



NTID 



AAID 



5978 



Length Length 
T5T" 



Score Probability 



14 . 9e-33 



Protein name 



Description 



Locus Name 



sp:SMlte_>Ac!riU 



Acc# 



032230 



SMALL PkO TE IN B HOMOLoG 



268 



ORF Name 



NT ID 



4960812 t± Ibi 



757" 



Protein name 



Description 



^ — Score Probability 



AAID Length Length 



¥77" 



2W 



1 . 8e-26 



Locus Name 



sp:TM10_Bokl3U 



Acc# 



051088 



THIOklibuJtlN (TkX) 



ORF Name 



517SS7S cl> i2U 



Protein name 



NTID 



75S~ 



AAID 



NT 



AA 



Length Length 
575 



Score Probability 



Locus Name 



Acc# 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|6.7e-13y 



Protein name 

raw starch digesting amylase precursor 



Locus Name 



|gp:AP0676b X 



ACC# 



AF067653 



Description 



Cytophaga sp. raw starch digestin g amylase precursor, gene, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



7£TT 



|2.0e-2u 



Protein name 



Locus Name 



thioredoxm-iiKe protein 



1 [gp^ 



ATAC0IO718 



ACC# 



AC010718 



Description 



Arabidopsis thaliana chromosome I hkC fc'^uib genomic sequence, complete 
sequence . 



269 



ORF Name 



[605651^ cJ Jfcib 



Protein name 



NTID 



7^T~ 



NT 



AAID Length Length 
1014 



— Score Probability 



TT7~ 



Locus Name 



Acc# 



Description 



IN0-H1T 



ORF Name 



Protein name 



NTID 



7£T 



AAID 



NT 



Length Length 



AA 

Score Probability 



T0"5TT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



5985 



conserved hypothetical protein BBUiyb 
Description 



T7TT 



TIT 



Locus Name 



bxr:CV0l24 



4.9e-0b 



Acc# 



C70124 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



SAAlll^lJiS.l 



TUT 



3T2~ 



T2TT 



Locus Name 



lsp:YkkX_fc!Tk<Ju 



Description 



1 . 7e-07 



Acc# 



P37977 



270 



NT 



AA 



ORF Name 



NTID 



AAID 



7FT 



Length Length 
TIT™ 



Score Probability 



Protein name 

Description 
MO -HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



$&6^Al...c.±JXAL I 



Length Length 

2ft 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



mBM.l...c±Jl&B. I [7F7 



Length Length 
TSTT 



Score Probability 



Protein name 

Description 
MO -HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



7TF 



i.5e-10 



Protein name 



Locus Name 



hypothetical protein PH1670 



pir :F71047 



Acc# 



F71047 



Description 



ORF Name 



Protein name 

Description 
NO-HIT 



NTID 



AAID 



NT 



AA 



Length Length 
75 1 PT7 



Score Probability 



Locus Name 



Acc# 



271 



ORF Name 


NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


1256885_J:3_133 


770 5552 


382 114 5* biu 


7.6e-4$ 


Protein name 


Locus Name 


Acc# 


Mani>6A 


gp:AF126471 


AF126471 


Description 




Cellulomonas rimi 


Man26A (man^6A) gene 


, complete cas . 






ORF Name 


NTID AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


i277325b_c2_2ay 


771 5553 


519 1560 41/ 


5.7e-39 



Protein name 



Locus Name 



conserved hypotneticai protein 



] [pir:B7^3^r 



Acc# 



B72391 



Description 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


±llQM.±b...a±...l±± 


772 5554 


522 " 


1569 


315 


6.8e-36 



Protein name 

Arylsultatase precursor ikc 3.1.6.1; 



Locus Name 



Acc# 



1 |gp:D507^I 



Description 

B . coli genomic 1>NA, Kohara c lone #280 133 . v-34 . 1 mm. 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


13.7.1l4i:/...cil...il^ 


773 5555 


105 


321 


14$ 


1.4e-05 



Protein name 



Locus Name 



Acc# 



TkK system potassium uptaKe protein ttrKAj I igp:U3 2 74b 



Description 

Haemophilus influenzae kd sect ion 6U ot 163 or tne complete genome. 



272 



ORF Name 



NTID 



14551512 ti b 



T7T 



NT — ^ Score Probability 

i.2e-77 



AAID Length Length 



7WT 



Protein name 



Description 



Locus Name 



Acc# 



sp:VA(i(J_E(JuLl 



HYPOTHETICAL riVMPOk'JBK IN PhiRJ fe-ARGP IMTfiRgfitllC RKcjlui^J 



NT 



AA 



ORF Name 



NTID 



AAID 



14724042 cl 203 



T7T 



Length Length 
— 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



KfO-HlT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



2TeT 



i.3e-4l 



Protein name 



Locus Name 



dimethylamine comnoid protein Mtoc 



[gp:APl02623 



ACC# 



AF102623 



Description 



Methanosarcina ba rker! dimethylamine corrinoia protein Mtnu — 
(mtbC) ,trimethylamine methyltransf erase MttB (mttB), trimethylaminecorrinoid 
protein MttC (mttC) , putative transmembrane protein MttP (mttP) , and 
dimethylamine methyltransf erase MtbBl (mtbBl) genes , complete cds . 



NT 



AA 



ORF Name 



NTID 



\205/l^h:z...a2^10:i 



TTT 



AAID Length Length 

t^h — 



Score Probability 
19.7e-l2b 



Protein name 



Locus Name 



Acc# 



P31971 



Description 
NADH-PLASTOQUIMONE OXIDOKljD UCTAgE CHAIN b, 



273 



NT 



AA 



ORF Name 



NTID 



AAIP Length Length 



Score Probability 



2117177 ±2 71 



WTT 



i.5e-31 



Protein name 



endo- 1 , 4 -Joeta-mannosicLase 



Locus Name 
|pir:D72278 



Acc# 



D72278 



Description 



NT 



AA 



ORF Name 



NTID 



AAIP Length Length 



Score Probability 



FOTJT" 



1151 



6 . Oe-18 



Protein name 



Locus Name 



renin-bxndmg protein- related protein : protein 
slrl975 rprotein slr!975 



Description 



lpir:S7564y 



Acc# 



S75649 



ORF Name 



2i&i5AU...±i...:L 



Protein name 

Description 
MO-HIT 



NTID 



NT AA 

— • — , Score Probabi lity 
AAIP Length Length ~ 



1809 



Locus Name 



Acc# 



ORF Name 



NT 



AA 



NTID 



iiiio.o.o.2.±i...io...... I nrr 



AAIP Length Length 
777 



Score Probability 
4.5e-44 



Protein name 



Locus Name 



Man26A 



|gp:AF126471 



Acc# 



AF126471 



Description 

Cellulomonas timi Man26A (man2 6A) gene, complete ccts . 



274 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



22550917 c'A 317 



732" 



6004 



TTTT 



1PT 



0.036 



Protein name 



Locus Name 



Acc# 



endo-beta-1, 3-glucanase precursor 



gp:AJ?'01jlby 



Description 



Pyrococcus furiosu s beta-giucosidase iceiii) gene, complete cds ;actn-iam 
operon, complete sequence; biotin ligase BirA homolog (birA) gene, complete 
cds; and 2 -phosphoglycerate kinase (pgk)gene, partial cds . 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


237l2837J:2_8l 


783 


5005 




377 


1134 


16S 


4.8e-12 



Protein name 



Locus Name 



conserved hypothetical protein scyc/.i4c 



pir :Tibybb 



Acc# 



T35965 



Description 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 




784 


5006 


398 1197 


283 


9.0e-2b 



Protein name 



Locus Name 



conserved nypotnetical protein 



Acc# 



B72278 



Description 



ORF Name 



NTID 



AAID 



— — Score Probability 
Length Length 



SWT 



Protein name 



TIT 



Locus Name 



Acc# 



Description 



INO-HIT 



275 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



24415962 c'A Ju4 



T5TT 



Probability 
|2.9e-19 



Protein name 

MADH dehydrogenase (ubiquinone ) , 1 cnam 1 
RP795 



Locus Name 



lpir:E7164o 



Acc# 



E7X640 



Description 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


2449.21A^tl^ 


767 


5009 


1075 


3228 


153 


1.4e-07 



Protein name 



Locus Name 



probable secreted gxucosidase 



j jpxr; ' l ' iblfe4 r 



Acc# 



T35164 



Description 



NT 



ORF Name 



NTID 



AAID Length Length 



7\ 7\ 

— Score Probability 



^ilM^cUlfc | 



FOTu" 



1281 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2.i5A40.b.2....Gl...2LO.b..„ 


-"" 7S9 5011 


405 


1218 


197 


2.4e-2£ 



Protein name 



Locus Name 



Acc# 



"alpha- 1, 3/4-iucosictase precursor 



U3 93-94 



Description 

gtreptomyces sp . alpha- 1, 3/4-l ucosidase precursor gene f compietecas . 



NT 



AA 



ORF Name 



NTID 



24645437 c± 3&4 



f7W 



AAID Length Length 
2705 



TOT" 



Score Probability 
13 . 8e-08 



T7F" 



Protein name 



115K outer membrane protein precursor : sus<J 
protein 



Locus Name 
pir:J<JbU2V 



Acc# 



JC6027 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



I.ie-49 



Protein name 



probable glycosyl nyarolase 



Locus Name 

pxr :T3b4b7 



Acc# 



T36467 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



732- 



Length Length 




fTTT 



Score Probability 
|5.7e-S5 



Protein name 



Description 



Locus Name 



Acc# 



spiNBOHjiCjOLl 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



\2BAtlb.h.b....cxl^lb^ I 



[6015 



sir* 



4.4e-87 



Protein name 



Locus Name 



Acc# 



sp:TRKHJiOoLl 



Description 

M SYSTEM POTASSIUM UPTAKE j^ OTfllM TrkH 



277 



ORF Name 


NT 

NTTD AAID Length 


AA 

— , Score 
Length 


Probability 


26230i>6bJ:2J>7 




lb / o 




Protein name 






Locus Name 


Acc# 


Description 










MO- HIT | 


ORF Name 


NT 

NTID AAID Length 


AA 

— , Score 
Length 


Probability 


2£i&ttm..±a...iajL 


795"' 6017 


660 J^b 


i.9e-3i 


Protein name 






Locus Name 


Acc# 


| phosphoglycolate pnospnatase (gpn; Jaomoiog 


i 


pir:C701d4 


C70184 


Description 


ORF Name 


NT 

NTID AAID Length 


— , Score 
Length 


Probability 


\l&\L6bA'^^XU& 


.796 6018 498 


1497 /iy 


1.9e-73 


Protein name 


Locus Name 


ACC# 


NADH denyctrogenase 


(ubiquinone J , cnain 




pir:S74687 


S74687 


4.2:protein slrl291 :protem slrl291 








Description 










ORF Name 


NT 

NTID AAID Length 


AA 

— , Score 
Length 


Probability 


16.1&A6:ll^cl^:.U 


79? 601$ bil 


1596 VJ» 


4.7e-8b 


Protein name 


Locus Name 


Acc# 


KAfitt dehydrogenase 


(ubiquinone) , I cnain 




pir:D7041i 


D70413 


nuoD2 









Description 



278 



ORF Name 



NTID 



12658770a i2 bb 



NT — t n Score Probability 

3.8e-13 



AAID Length Length 
T5TT 



Protein name 



Locus Name 



unknown 



gp:U9b771 



Acc# 



U96771 



Description 



Erevotella bryanti i putative polygalacturonase , B- l , 4-enaogiucanase , 
mannanase genes, complete cds; and unknowngenes . 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


265S4137J:2_7b 




799 


6021 


336 


1011 


253 


7.$e-26 



Protein name 



Locus Name 



methylcobamide:CoM methyltransrerase isozyme 



bp:AP0l37U 



Acc# 
AF013 713 



Description 

Methanosarcxna barken methyl cobamxde : com metnyicransrerase isozymeA 
(mtbA) , monomethyl amine corrinoid protein (mtmC) , 

monomethylaminemethyltransf erase (mtmB) , putative monomethylamine permease 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


16£A±:i±'^a2^Xti% 


... 800 


6022 


485 


1458 




725 


1.3e-71 



Protein name 



Locus Name 



sp:NU2CjbiYl^i 



Acc# 



P72714 



Description 
NADH-PLA5 T 0QU1NON E OXIDOkE DUCTA^ii! chain z , 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length ' 


Score 


Probability 


2$A$l£KL.c±...l0A 


801 


6023 


126 


381 


225 


- i.3e-iS 



Protein name 



Locus Name 



|sp:NU3C_ANTFu 



Acc# 
Q31792 



Description 

MADH-^LAdTOOUltJu^iii OXIbOkii!DU(JT AbiliJ (JHAllsl 3, cJHhokukhA^'r, 



279 



NT 



AA 



ORF Name 



NTID 



31776708 iVl 



AAID Length Length 
TFB 



Score Probability 



I4.7e-12 



Protein name 



Locus Name 



'NADH dehydrogenase (ubiquinone) , i cnain nuoB | ipir :C7U4iJ 



Acc# 



C70413 



Description 



NT 



ORF Name 



NTID 



125.12&1&.±2...3.0. 



AAID Length Length 



AA 

— Score Probability 



sir 



Protein name 



Description 



Locus Name 



Acc# 



RTO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



$026 



Length Length 



— ^ Score Probability 
0..04b.' 



Protein name 



Locus Name 



hypotnetical protein 



|pir:C72Jy7 



Acc# 



C72397 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


16±12b.8h^cxl^l6A 


SOB 


£027 


114 


345 






Protein name 








LOCUS 


Name 


Acc# 


Description 














MO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


16.16.0.^^2^1^. 


SOS 


602S 


172 


515 


204 


2.1e-16 



Protein name 



Locus Name 



"NADH dehydrogenase (ubiquinone) , I chain J | [pir:C7ia5T 



Acc# 



C71839 



Description 



280 



ORF Name 



NT ID 



AAID 



37TT 



— — Score Probability 
Length Length 

7 .6e-41 



2799 



Protein name 



Locus Name 



sensory transduction histiame Kinase 
slr2098 iprotein slr2098 rprotein slr2098 



Description 



bir:S7BI3o 



Acc# 



S75130 



NT 



AA 



ORF Name 



NT ID 



AAID 



Length Length 



Score Probability 

i.4e-aa — 



Protein name 



Locus Name 



NAM 



dehydrogenase i, suDunit nuoB 



gpTECTrUTO" 



Acc# 



X6 83 01 



Description 



E.coli UNA sequence or nuo operon. 



ORF Name 



NT ID 



AAID 



— — Score Probability 
Length Length 

WITZ 



TTT7T" 



TUT 



2.4e-ai 



Protein name 



receptor antigen (RagA) 



Locus Name 
"I |gp;PGIi:Aug 72" 



Acc# 



AJ130872 



Description 



Porphyromonas gmgivalis wbu receptor 
immunodominant 55kDa antigen . 



antigen (rag) locus encodmga major 



ORF Name 



NT ID 



AAID 



^5 — Score Probability 
Length Length 

6 .3e-18 



T7T 



Protein name 



Locus Name 



S±pl protein 



] p±r:S2T)b2 ~ 



Acc# 



S27762 



Description 



281 



ORF Name 




NT ID 


AAID 


NT AA 
Length Length 


Score 


Probability 


4566876_c2_285 


811 


6033 




1464 


411 




Protein name 










Locus Name 


Acc# 












sp:YIDJJiUuLl 


P31447 


Description 
















HYPOTHETICAL «iV 




IN ElMkb- 


-GLVG m 


TEftSEMlcJ 


REGION 


1 


ORF Name 




NT ID 


AAID 


NT AA 
Length Length 


Score 


Probability 


4$7S3l3_c3__3e>9 




812 






3S / 






Protein name 










Locus Name 


Acc# 


Description 
















MO-HIT 1 


ORF Name 




NT ID 


AAID 


NT AA 

— Score 
Length Length 


Probability 


£iilfl3La„.c2L...iaex 


815 


603S 


105 


318 


231 


2 . 9e-iy 



Protein name 



Locus Name 



sp:N0LC>LEBO 



Acc# 



Q00244 



Description 
NADH-^LA5T00UINoN E OXIDOkflDPCTASK chain 4L, 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


S£5M.XL±i..±%6. 


814 6036 


1380 


4143 


519- 


4 . 9e-46 



Protein name 



Locus Name 



utilizxng regulatory protein cunU 



IgptT ' l ' UbVaOu 



Acc# 



U57900 



Description 

Thauera aromatica utilizing regulatory protein tutC (tutu; utilizing 
regulatory protein tutB (tutB) , putative DNA bindingprotein TutBl (tutBl) , 
and putative protein kinase TutCl (tutCl) genes, complete cds . 



282 



NT 



AA 



ORF Name 



NTID 



AAID 



164441:47 l^y 



FuTT 



Length Length 

— 



Score Probability- 
IS. 8e-06 



ITS 



Protein name 
CmuC protein 



Locus Name 
bp:MaP01lJlV 



Acc# 
AJ011317 



Description 



Methylobacterium sp. CJM4, co bB, met!? 1 , cmuB, cmuc, partial cose anctcoDy, 
genes and genes encoding Orf219 and Orf361. 



ORF Name 



NTID 



NT AA 
* — Score 

AAID Length Length 



7074lbb tl 1 



Probability 
1.4e-l4 — 



Protein name 



Locus Name 



unknown 



[gp:TO6771 



Acc# 



U96771 



Description 

Prevotella bryantu putative polygalacturonase, b-i , 4 -enaogiucanase , and 
mannanase genes, complete cds ; and unknowngenes . 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


iiio±b±^...iy± 


817 


6039 


686 2061 


1366 






Protein name 






Locus 


Name 




ACC# 








sp:DXS_ 


KAE1N 




P45205 


Description 














■'■i-DEOXYXYLULo^-b 


-PHOSPHATE 


SYNTHASE 


(DXE> SYNTHASE) 






i 


ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


33.ai&a...Gi...iait 


.... 818 


6040 


512 1539 


359 







Protein name 

Description 
HEXUkONATE TkAi^kukTlilk 



Locus Name 



sp^XtH'JidoLl 



Acc# 



P42609 



283 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



5954506 ri J 



3.8e-i45 



Protein name 



beta-xylo-glucosidase 



Locus Name 
] |gp:T^bb2 7"5~ 



Acc# 



Z56279 



Description 

T.brockn cglF, cg lfl, xgl^ ana cgix genes. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


33212528_c3_il 


820 




554 






Protein name 








Locus Name 


Acc# 


Description 












no-hit 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


l±±$.$.12..±i..M 


821 




71 


216 53 


0.017 


Protein name 








Locus Name 


Acc# 












Q95152 


Description 












GLYCOPROTEIN i« 


PRECURSOR 


(GP3S) (MUClN-TyPE 


MEMBRANE PROTEIN 


GP40) | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


10mill)J..±L..<LL 


822 


6044 


177 


534 




Protein name 








Locus Name 


ACC# 


Description 













284 



NT 



AA 



ORF Name 



NTID 



WIT 



AAID Length Length 
— 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— ^, — _ Score Probability 
Length Length 



10:i±AMA..±±..AL I 



OT3F" 



S3T" 



TT5F" 



or 



1.2e-6i 



Protein name 



Locus Name 



Acc# 



Sp:DINP_ECOLI 



Description 
DNA- DAMAGE- INDUCIBLE PROTEIN £> 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



l lQ5L&aai6....£l...&& I 152"? 



FT" 



189 



TUT" 



|1.4e-0S 



Protein name 



Locus Name 



hypothetxcal protein APE2457 



|pir:H7S47^ 



Acc# 



H72476 



Description 



ORF Name 



NTID 



AAID 



\l012±T.L.al.A22 1 



Protein name 

Description 
PRE PROTEIN TRANSLOCATE SECA SUliUNIT 



NT 



AA 



Length Length 
TTTTT 



TJJT 



Score Probability 
37T " 



Locus Name 



sp : SECA_RHOCA 



3.2e-iS4 



Acc# 



P52966 



2 85 



NT 



AA 



ORF Name 



NTID 



WIT 



AAID Length Length 
TZUZ 



TUT 



Score Probability 



Protein name 

115K outer membrane protein precursor : susu 
protein 



Locus Name 



lpir:Je<^7 



Acc# 



JC6027 



Description 



NT 



AA 



— — , Score 



ORF Name 



NTID 



AAID Length Length 



WIW 



1TW 



TWT 



Probability 
5.8e-4S " 



Protein name 



Description 



Locus Name 



sp:A^kEJlAEll\l 



Acc# 



P44550 



T HIAMINE bloaVMTHijdlS LlkOkkoTE lis! AJ^BJa! PkECUk^ok 



NT 



AA 



ORF Name 



NTID 



AAID 



WIT 



Length Length 
TWI3 



Score 



Probability 
1.3e-40 



Protein name 



Description 



Locus Name 



sp : STS JRAT 



Acc# 



P15589 



SUL F ATE & [JLEoH V bkuLA£J E ) ( Ak ¥ L SULEATAij E C) Iasuj 



ORF Name 



NTID 



AAID 



^ ^ Score 
Length Length 



\XlOA^...^Ji.6:^.... I 



^1 1 11626 



1555" 



Probability 



Protein name 



Locus Name 



|sp:RLUA_EtJOLl 



Acc# 



P39219 



Description 

(PSEUbuUklMLATE ^NTHA^E) (UR ACIL E^bkuL^EJ 



286 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


T28775J:i_5i 


831 


6053 


166 


501 








Protein name 








Locus 


Name 




Acc# 


Description 
















NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Pr 


obability 
i.7e-46 


i2a9.:y.b.b.A...ci...44i. 


832 


6054 


307 


924 


488 







Protein name 



bxidoreductase, snort cnain 
dehydrogenase/reductase family 



Description 



Locus Name 



Ipir:iiJV242V 



Acc# 



E72427 



ORF Name 



Protein name 



NT AA 
— — Score 



NTID 



AAID Length Length 



Probability 



WIT 



TZTT 



Locus Name 



Acc# 



Description 
HYPOlWl'lOAL b4.H KU Pkoimai lis! P'lA 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— Score 



13.8.B.126.^.±Z...lub... 



1338 



Protein name 



Locus Name 



Probability 



Acc# 



Description 



1N6-MIT 



287 



NT 



AA 



ORF Name 



NTID 



AAID 



57T 



Length Length 
BT5T 



Score Probability 



166 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



14MJ.9.fo.2....al...i.i.i.. 



Length Length 
TT5I — 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO -HIT"" 



NT 



AA 



ORF Name 



NTID 



AAID 



±All±lb.L..alJ±Ll).... 



WIT 



Length Length 
2022 



ZTT 



Score Probability 
li.ie-7^ 



TTW 



Protein name 



Locus Name 



type ill DNA moditication enzyme 
(methyl transferase) 



pxr : fvihiO 



Acc# 



F71810 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 
1.2e-27 



Protein name 



Locus Name 



Acc# 



probable beta-giycosy±transrerase trsc 



pir :S5i^62 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



288 



ORF Name 



Protein name 



NTID 



6062 



NT 



AA 



AAID Length Length 
TTT 



Score Probability 



Locus Name 



Acc# 



Description 
MO-HIT 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



— Score Probability 



6063 



2.5e-2^ 



Locus Name 



Acc# 



lsp:YM0Ji!cJoLl 



HYPOTHETICAL ^b.b Rb frkoTlaim IN DHM-1NT& M'KRC^lic rkgiun 



ORF Name 



ifiiaia.„ci...4feLfa.. 



Protein name 



NTID 



AAID 



TOST" 



— — S core Probability 

Length Length 





153 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protexn name 



NT 



NTID 



AAID Length Length 



— score Probability 



TTST 



| l.ge-80 



Locus Name 



Acc# 



"&TP- binding protein' 



lgp:AP0iy4oV 



AF019407 



Description 

Oaulobacber crescentus QTP -bind ing protein icgtA) gene, compieteud* . 



289 



ORF Name 


NTID 


AAID 


NT AA 

— Score 
Length Length 


Probability 
- 3.2e-06 


i682946i_t2J.l2 


j 844 


6066 


TT9 360 108 




Protein name 






Locus Name 


Acc# 


1 hypothetical prote 


in PHoibO 




| pir:E7114i 


E71143 


Description 


ORF Name 


NTID 


AAID 


NT AA 

_ — Score 

Length Length 


Probability 


±6&12&8.h..±2..±^. 


.. 845 


6067 


TTL — 1256 "' 1723 


2 . 3e-177 | 



Protein name 



Locus Name 



Acc# 



hypothetical protein 




1 pir:JQ1020 


JQ1020 


Description 










OPT? Name NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 
" 2.6e-162 


15..7.0.3A6.1.-Cl...ili 846 


6068 


481 1446 


1581 




Protein name 




Locus Name 


Acc# 


[ unknown 




1 gp:A£'04874y 


AF048749 


Description 

— ■ i. — — ^™^-r^rs 1 




Bacteroides tragiiis capsui 


ar poiysac 


cnaride oiosyncnesis up« 






sequence . 












ORF Name NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 
2.0e-2y 


19.7.25.2SA..C.2...MJ. 84 7 


6069 


im "609 


327 





protein name 



Description 



Locus Name 
sp:Vli^_MlilTTH 



ACC# 



027840 



290 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


I97971^J:2J.i9 


848 


6070 


357 1074 iuys 


" 8.1e-ili 


Protein name 


Locus 


Name 


Acc# 


nucleotide sugar epimerase 


gp:AF0b97bb 


1 AF059755 


Description 


Vibrio vuiniticus nucleotide 


sugar epimerase gene, complete cc 




ORF Name 


NTID 


AAID 




NT 
Length 


AA 
Length 


Score 


Probability 






6071 




355 1068 


153 


8 . 7e-l5 


Protein name 










Locus 


Name 


Acc# 


1 lumO protein: protein siri2±jj 


: protein 


slri2ii 


pir:S77S48 


S77548 


Description 


ORF Name 


NTID 


AAID 




NT 
Length 


AA 
Length 


Score 


Probability 


20.DS0.40.2...±1...2b.i 


850 


6072 




163 492 


129 


1.8e-07 


Protein name 










Locus 


Name 


Acc# 


phosphopyruvate nydratase 








pir:C7b2bl 


C75251 


Description 


ORF Name 


NTID 


AAID 




NT 
Length 


AA 
Length 


Score 


Probability 


2iD.ag.7.7.s.i..±3....ii.y. ..... 


851 


6073 




319 960 


1657 


2.3e-i7U 


Protein name 


Locus 


Name 


ACC# 


putative UDP~<JlcNAc:undecaprenylpnospnate 


gp:AF048749 


AF048749 



Description 



Bacteroides iragilis capsu lar polysaccharide Jaiosyncnesis operon, complete 
sequence . 



ORF Name NTID AAID 


NT 
Length 


AA 
Length 


Score 


probability 
" 9.6e-21 


T037502J:ljL7 852 6074 


256 


771 


245 




Protein name 




Locus 


Name 


Acc# 


conserved hypotnetical protein 




pir:D72i20 


D72320 


Description 


ORF Name NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2H7.M6.25...±3....2.D.a 853 6075 


223 


672 


225 


|l.3e~l« 



Protein name 



Locus Name 



hypotnetical protein 



[gpr^Ul^iU 



Acc# 



Y18930 



Description 

Sulrolobus soltatancus 2si kb g enomic bh\A tragment, strain ■ 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


203£Al±l.±l..±W 


"... 554 


5076 


347 1044 


1691 


" |5.7e-174 



ft) 



Protein name 



TOE ) -glucose-4-epi merasey / cLTUP-giucose-4 / b 



Locus Name 
gp:AVo4ri74y 



Acc# 



AF048749 



Description 

Bacteroides tragilis capsular p olysaccharide mosyntnesis operon, 
sequence. 



ORF Name 



NTID 



21l5.l0...±l...b.7.... 



£"55" 



55 M score Probability 

AAID Length Length — 

|2.6e-27 



1029 



Protein name 



activator protein 



Locus Name 
I |gp:AF04Vb2T" 



Acc# 



AF047527 



Description 

Pseudomonas tluorescens activator protein (mtiKj gene, compxececaB. 



292 



ORF Name 



NTID 



AAID 



— — S core Probability 

Length Length ™ 



[ , 21640tia7 - t2 - _ll , r 



[5TT71T 



|i.7e-08 



Protein name 
^hypothetical protexn 7.17 



Locus Name 
pir:D47b7V 



Acc# 



Description 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length 



|2.15.S.lb.b.^ai^M^ I |357 



"715T 



5.2e- 7 9 



Protein name 
| thiophene and turan oxidation protein 
Description 



Locus Name 



Acc# 



C70375 



ORF Name 



NTID 



— — score Probability 

AAID Length Length 



i2as3L&a^....ta^isAu ipss 



|3.7e-llb 



Protein name 



j putative methyl transrerase" 



Locus Name 
|gp:A* , 048 74y 



Acc# 



AF048749 



Description 



biosynthesis operon, complete 



Bacteroides tra gi lis capsular poiysaccnaride 
sequence . 



— — Score Probability 



ORF Name 



NTID 



AAID Length Length 



22M0.yJ.:A..±l...i 



5081" 



l.le-4b 



Protein name 



Locus Name 
sp : STS JWMAlsl 



Acc# 



P08842 



Description 

SULFAS ^ULPoHVDkuhA^) (Aft^Uhl? ATAs^ C) (AikJ) 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



5T" 



0.031 



Protein name 



Locus Name 



sptSPkCJWNLA 



Acc# 



P36378 



Description 

(OgTSoMEef IN ) (ON) (BA^EMJblM 1 ! 1 M EMB&ANJa! 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



124 



T75~ 



TIT 



Probability 
IS.?e-0S 



Protein name 



phosphopyruvate nyaratase 



Locus Name 
b±r:0Vl>2bl" 



Acc# 



C75251 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



183 



Protein name 



Description 



Locus Name 



Acc# 



[NO -HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



23.6.4.7.7.b.B....c^....3.y.y... 



WT 



Protein name 



Description 



Locus Name 



Acc# 



WO-HIT 



NT 



AA 



ORF Name 

12103.6.2.b.^al^ll 



NTID 



AAID Length Length 



J7T 



— t , Score Probability 
2.2e-bl 



627 



Protein name 



Locus Name 



"dolichol -phosphate mannosyitransterase 



jpir:G704£>3 



Acc# 



G70463 



Description 



294 



NT 



AA 



ORF Name 



NTID 



24064142 i'A 148" 



AAID Length Length 




SWT 



Score Probability 
3 .8e-33 



JET 



Protein name 

hypothetical protein ywnb 



Locus Name 



pir :K70o6i" 



Acc# 



E70063 



Description 



NT 



AA 



ORF Name 



NTID 



\1±±11±±1^1J1.1± I 1355 



AAID Length Length 




Score Probability 



I23T" 



Protein name 



Locus Name 



Acc# 



Description 
INO-HIT 



ORF Name 



NTID 



— — Score Probability 



AAID Length Length 



867 



7TT 



i.9e-24- 



Protein name 



Locus Name 



hypothetical protein yisx 



] |P ir:Cj6 ^ B3r 



ACC# 



G69838 



Description 



ORF Name 



NTID 



AAID 



NT AA Score Probability 
Length Length 



515W 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



242571tt7 f2 ISA 



9.0e-112 



Protein name 

putative carboxybiotin decarboxylase suJounit: 
of 



Locus Name 



gp:MkUb7ytiU 



Acc# 



U87980 



Description 

Malonomonas rubra p utative is -element gene, partial 
decarboxylase gene cluster (madY, madz, madG, madB, madA,madE, made, madD, 
madH, madK, madF, madL, madM, madN) genes , complete cds . ^ 



ccis, andmaionate 



ORF Name NTID AAID Length 


— , Score 
Length 


Probability 


24401S07 cl 299 S70 6092 510 


T533 2702 


4.2e-2^1 


Protein name 


Locus Name 


Acc# 


| unknown 


| gp:AP04tt74y 


AF048749 


Description 

„^ — . i i . ~ i 1 




"Bacteroides tragilis capsular polysaccnande 


JDiosyncnesis ope 


LULL, UUllipxcuc 




sequence . 








ORF Name NTID AAID Length 


AA 

— , Score 
Length 


Probability 
0.0037- 


s£ma2L...G2...m hit- 6093 ezz 


1929 110 




Protein name 


Locus Name 


Acc# 



sp:Y0bWJ4Y<JLU 



Q49757 



Description 

riVTOl'HhlTlCAl, 31.1 k£ £>koTJ b ili>J B 1537Jte_39 



296 



ORF Name 



NTID 



^ — score Probability 



AAID Length Length 



2447281V 12 106 



Protein name 



Locus Name 



putatxve Hemolysin 



gpiAFObiibfa 



Acc# 



AF051356 



Description 



Streptococcus mutans Vtqb (yfcqfi ) gene, partial cds; ABC transporter lancX) , 
putative permease (perM) , putative hemolysin (hlyX) .pyruvate- formate lyase 
activating enzyme (pflC) , D-alanine-D-alanyicarrier protein ligase (dltA) , 
integral membrane protein (dltB) , D-alanyl carrier protein (dltC) , 
extramembranal protein (dltD) . andputa tive exooolvphosphatase (ppxl) genes, 



o 

w 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length 



toAAStBMljJ&.-SAh J m? 



1500 



KIT 



0.00014 



Protein name 



Locus Name 



immunogenic 7b kDa protein PG4 



gp:AP14b800 



Acc# 
AF145800 



Description 

Porphyromonas gmgivalis str ain WbO immunogenic 75 KDa protein Pii4gene, 
complete cds . 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




2.4.fi3JLi.0i...±2...2lJb 


§74 5056 


183 552 




1 . Oe-bb 




Protein name 




Locus Name 


Acc# 




unknown 


gp:At , 048749 


AF048749 




Description 




Bacteroides "f ragilis capsular polysaccharide biosyntnesis operon, complete 
sequence . 




ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2AMll&l^&Ji:& 


875 B"097 


498 1497 


6&5 


2.3e-67 



13 



Protein name 



Locus Name 



sp:RIBB_k!<JoLl 



Acc# 



P24199 



Description 



297 



ORF Name 



|24651blb yl 



Protein name 



NT ID 



NT 



AA 



AAID Length Length 
TZ1 



Score Probability 



TOTT 



Locus Name 



Acc# 



Description 



(NO-HIT 



ORF Name 



— — Score Probability 



NT ID 



AAID Length Length 



l ^AOA^al^ll | 

Protein name 



TTTT 



Description 



WIT 



probable uridine pnospnoryiase AFi^iub 



Locus Name 
] [pir:D72blb 



4.ie-i6 



ACC# 



D72516 



ORF Name 



Protein name 



NT ID 



WW 



NT 



AAID Length Length 



— Score Probability 



Locus Name 



Acc# 



Description 
MO-HI T 



ORF Name 



Protein name 



NTID 



W7T 



AAID 



NT 
Length 



AA 
Length 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



hypothetical protein siiibVl 



Description 



— — Score Probability 
Length Length 





TEHT 



l .4e-iv 



Locus Name 



j jpir :574g^B" 



Acc# 



S74655 



298 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


262<5i3i:JJti_bfc 


881 


6103 


63 


192 






Protein name 








Locus 


Name 


Acc# 


Description 














MO -HIT 1 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


26AAA6KL.cxz...a^jl 


882 


5104 


| 434 


1305 


1588 


" 4.6e-163 



Protein name 



Locus Name 



sp:EM0_aTAAU 



Acc# 



069174 



Description 

GLVCflfeATU HVDkO-h^AsE) (LAMlNiJsl B IN&mci PkO'l^lN) 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



883 



9 . 9e-90 



Protein name 



Locus Name 



Acc# 



putative nypoxantnme guanine 



| |gp:AP04874T 



AF048749 



Description 

Bacteroides rragilis capsular polysaccharide Diosyntnesis operon.com] 
sequence . 



plete 



NT 



AA 



ORF Name 



NTID AAID Length Length 

TT7 



Score Probability 



T5T 



Protein name 
Description 



Locus Name 



Acc# 



NO -HIT 



299 




ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




385 6107 


17b b^S 










Protein name 






Locus 


Name 




ACC# 


Description 


















NO-HIT 




ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


16££Ai)±2...a2.,Ai)A 


885 5108 


355 


L058 


ii^: 






Protein name 


LOCUS 


Name 




ACC# 
AJ131708 




| gamma response I protein 


gp:ATH131708 






Description 
















AraJDidopsis tnaiiana 


gr I gene, exons 


1-3. 












ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




26J£±0A±..a2...16±... 


887 5l09 


1017 


3054 




527 






Protein name 






LOCUS 


Name 




ACC# 
AF060119 




1 restriction endonuclease 




j gp:AF060119 






Description 














_ _i .... ^ ^ „ ^ 




Pasteurella haemoiytica methyltransrerase (mod; ana resincuOTenuunu^ca^ 
(res) genes, complete cds . 




ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 




Score 


Probability 

— i n-; — 7<rr — hah 1 


Z6.&lLh.tLil.±l..±iX* 


888 5110 


416 


1251 




1189 






Protein name 






Locus 


Name 




Acc# 
AF144640 




immunoreactive 47 kjj antigen PO'X'AU 




| gp:AF144640 






Description 

















Porphyromonas gmgivalis strain WbU immuno reactive 47 KB ancigenPGi^u gene, 



complete cds . 



300 




ORF Name NT ID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 
2.7e-06 




275i2b__t2__lli 889 


6111 


469 : 


1410 HI 




1 


Protein name 






J-iOCUS IN emit; 


ACC# 




hypothetical protein kv^j^c 






J pir:FV07ob 


F70705 




Description 












OPF Name NT ID 


AAID 


NT 
Length 


— Score 
Length 


Probability 


"1 


28±l&i:L±±Jlb:i 890 


6112 


270 | 


8l3 ^8 


Z . JC ZD 


1 


Protein name 






Locus Name 


Acc# 










sp : YFIH_JlAii!lM 


P44552 




Description 












HYPOTHETICAL i>koTEIM HlOlVb 








1 




ORF Name NT ID 


AAID 


NT 
Length 


— , Score 
Length 


Probability 
7.9e-07 




29A3.6.u4u..±I...3.b. 8 91 




420 


1263 144 






Protein name 






Locus Name 


ACC# 




1 NADU dehydrogenase (ubiquinone; , cnam z 


| pir:T1131y 


T11319 




Description 












ORF Name NTID 


AAID 


NT 
Length 


— , Score 
Length 


Probability 




T9:.7.D.3.I6.5...±2...11B. 8 92 


5114 


396 


1191 369 


6.9e-34 




Protein name 






Locus Name 


Acc# 










sp:CAM_BAOAN 


"~ | P19579 




Description 













CAM. fkol'hllJM 



301 



NT 



AA 



ORF Name 



NTID 



AAID 



30084688 ±2 l-Al 



Length Length 



— Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



(NO-HIT 



NT 



AA 



ORF Name 



NTID 



lb.Z\^l^J>.ll | p£ 



AAID Length Length 
[TF75 



Score Probability 
|2.ie-55 



\5TF 



Protein name 
aiKaime pnosphatase 



Locus Name 



|gp:ayi>i>HoA^ 



Acc# 



Z48801 



Description 



Synechococcus i>(jC7942 phoV gene ror 


alJcaime 


phosphatase . 








ORF Name NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 

-i r» — prr — =n* 




l±110Al±..c,2JiAl 895 


6ll7 


182 


345 204 








Protein name 






Locus Name 




ACC# 
C72360 




DNA polymerase III, aipna smounii; 




] pir:C72^60 






Description 














ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Pr 


obability 
7.2e-07 


12Ai)Alh..±l.,±ii.l 8 96 


6118 


135 


405 115 






Protein name 






Locus Name 




Acc# 
E71837 




1 protein- export membrane protein 




"] pTr:E71837 






Description 














ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




3.3.3.D.m0....a2...3.aO. 8 97 


6119 


' 421 


1266 








Protein name 






Locus Name 




ACC# 




Description 















ttTO-HIT 



302 



* 



NT 



AA 



ORF Name 



NTID 



AAID 



33357811 ±1 4i> 



Length Length 

— 



WW 



Score Probability 
10.00042 



TIB" 



Protein name 



Locus Name 



histxdme kxnase sensor protein 



] |pir:D7Qj25" 



Acc# 



D70328 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



3.3.4B^Ml..±Z...llu.. 



1WT 



0.04b 



Protein name 



Locus Name 



sp:TPMN_XiiiNLA 



Acc# 



Q01174 



Description 

T ROPOMYOSIN ALPHA (JHA1M, JsloM MUSCLE 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



wr 



T5T 



ST 



TT7MT 



Protein name 



Description 



Locus Name 



Acc# 



sp:YA4y_HAElJ>J 



HYPOTHETICAL PkuTHI N Hllu4^ 



NT 



AA 



ORF Name 



NTID 



AAID 



3AlS.$.3.&£..±l...i.B... 



90T 



Length Length 
936 



3X1 



Score Probability 
3 . Oe-49 



514 



Protein name 



Locus Name 



fgpTHCYTITTS" 



Acc# 



Y11138 



Description 

B.cereus DMA tor okVl, ukl^ a na ukfj sp; 



303 



ORF Name 



34407133 ti 47 



Protein name 



NT ID 



JUT 



NT 



AA 



AAID Length Length 
ffTB 



Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



JUT 



ZTZT 



glycosyl transferase pai3<j77^ 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TUT 



TJT 



Locus Name 



Ipir:b7b0^b 



1 . 4e-14 



Acc# 



B75096 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



ZT 



T5T 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 

Length Length ■- 

1J~ 



Locus Name 



Acc# 



Description 
INC -HIT 



ORF Name 



Protein name 



NTID 



— — Score Probability 
AAID Length Length ■ 



TUT 



6128 



TTT 



TTU~ 



6.0e-bl 



Locus Name 



pyridoxal phosphate biosyntnetic protein PdxA I |pTrTH7uT73 



Acc# 



H70373 



Description 



304 



NT 

r\"o TMamo NT ID AAID Length 


AA 

— • Score 
Length 


Probability 


346664b3J:3_224 | 6129 172 






Protein name 


Locus Name 


Acc# 


Description 




1 


MO -HIT 




1 


ORF Name NT ID AAID Length 


AA 

— , Score 
Length 


Probability 


3masi.7...±a...24ii sub 6130 123 


372 221 


3.3e-l8 


Protein name 


Locus Name 




| hypothetxcai protein 


1 pir:H7b47i 


H75473 


Description 






* — — Score 
ORF Name NT ID AAID Length Length 


Probability 
l.Se-122 


3.M40.0.b,..El-3.5 yoy 440 


1323 ±ZV5 




Protein name 


Locus Name 


Acc# 


1 putative u£>£>~glucose aehycirogenase 


~~| gp:AF1594^ 


AF159428 


Description 




I Surkhoidena pseuclomaiiei putative UDP-giucose dehydrogenase (uag) ,putauive 
ADP-heptose synthase (waaE) , and putativeADP-glycero-mannoheptose epimerase 
(gmhD) genes, complete cds . 




— Score 

ORF Name NTID AAID Length Length 


Probability 


3aaaaia...c2...aa£,. sio 6 132 699 


2100 Jbl4 


0.0 


Protein name 


Locus Name 


Acc# 



receptor 



Description 

Bacteroides tragi lis capsular .polysaccharide Diosyntnesis operon,com] 
sequence. 



plet 



305 



ORF Name 



NTID 



Protein name 



probable galactosyitransterase trsb 



Description 



NT 



AA 



AAID Length Length 



1179 



Score Probability 
|1.3e-31 



TTT 



Locus Name 



|pir:S5126i 



Acc# 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 

7W9 



AA 



1ST 



Locus Name 



Acc# 



Description 
NO-HIT 



ORF Name 



Protein name 



Description 



NTID 



— — S core Probability 

AAID Length Length ; 



[9TT 



1613b 



ll.8e.-I06 



Locus Name 



Isp : YOfAJiAUfcW 



Acc# 



P54466 



BV&OTUETICAL ib.6 KB Mo'l'ElN IN RPSU- PHOH InI'IiIregeniu Knl<Jiuj>i 



ORF Name 



Protein name 



NTID 



AAID 



^? — Score Probability 
Length Length 

733 



Locus Name 



Acc# 



Description 



ORF Name 



NTID 



AAID 



|4119.6.7.:/.^t^I25. I m3 



Protein name 

hypothetical protein jnpi4bb 



Descript 



ion 



— — Score Probability 
Length Length 

i.2e-l!> 



TTT 



Locus Name 



bir:C71^0b 



Acc# 



C71806 



306 



NT 



AA 



ORF Name 
14147280 oa i'M 



NT ID 



sir 



AAID Length Length 




Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



NO-HIT 



\ 118? 

i?1 



US; 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
TT^S 



Score Probability 
|8.2e-63 



Protein name 



Locus Name 



Wbp'U 



Acc# 



AF035937 



Description 

Pseudomonas aeruginosa strain 06 kpsA trpsAj gene f partiaicas; 

Ihf-Beta, Wzz (wzz) , and Wzx (wzx) genes, complete cds; andwbp gene cluster 
for O-antigen biosynthesis, complete sequence. 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
14"22 — 



FT7T 



Score Probability 
l.^e-06 



Protein name 



Locus Name 



unJcnown 



l gp:U^6V71 



ACC# 



U96771 



Description 



Prevotelia bryant ii putative polygalacturonase, B-i, 4- endogiucanase, 
mannanase genes, complete cds; and unknowngenes . 



ana 



NT 



AA 



ORF Name 



NTID 



[i3.23.26.2...cl...m 



AAID Length Length 



— Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



307 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 

run — 



Score Probability 
|i.le-05 



TUT" 



Protein name 



Locus Name 



hypotnetical protein 



gp 



:SSU18930 



Acc# 



Y18930 



Description 

Suitolobus soliatar icus 281 kb genomic DNA fragment, strain kg, 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4409462__c2_403 


551 


6l43 




1521 


631 






Protein name 








Locus Name 




Acc# 
F70418 




1 conserved hypotnetical protein aq__l36b 




J pir:F"y 


0418 






Description 
















ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


±&$A±b±.±^..±o:i 


922 


6144 


717 ' 


2154 


122 




u . u u u / o 


Protein name 


Locus Name 


ACC# 
AJ002316 




1 putative peptictyl -prolyl cis- 


trans isomerase 


| gp:ASAJ2316 






Description 
















"T^cThetobacter sp. ADP1 aiKR & aiKM genes, uRJ 




L 






ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


±125.1B±±£..XtiA 




6145 


427 


1284 


8 9 






Protein name 








Locus Name 




Acc# 
Y18245 




membrane protein 








"| gp:PPUY1^24b 






Description 














-m j \n 1 




Pseudomonas puticta tocix, todF, toaui, 
todl, todH, tods, todT genes. 


tod(l J 2 , 


tOCLB, t( 


DCLA, tOQJJ, tOUJi, uutao, 





308 



ORF Name NT ID 


NT 

AAID Length 


AA 
Length 


Score 


Probability 


4804632_c3_476 924 


£146 218 657 


1119 


2.3e-ii3 


Protein name 


Locus Name 


Acc# 


| unknown 


|gp:AF04874y 


AF048749 


Description 

1 - 4- ^ 1 




Bacteroides rragiiis capsui 
sequence . 


ar polysaccharide Joiosyntnesis operon^uuipieue 




ORF Name NT ID 


NT 

AAID Length 


AA 

— , Score 
Length 


Probability 


STTinOOJfciJS ' 


" 6147 48b 1458 




l.3e-77 



Protein name 



Locus Name 



O-antigen repeat unit transporter wzx 



1 jgptAFlVliilET 



Acc# 
AF172324 



Description 

E scherichia coll Gai^ 1 tgaltf) gene, partial cds; u-anuigeu repeatunic 
transporter Wzx (wzx), WbnA (wbnA) , O-antigen polymerase Wzy(wzy), WbnB 
(wbnB), WbnC (wbnC), WbnD (wbnD) , WbnE (wbnE) , UDP-Glc-4-epimerase GalE 
(galE), 6-phosphogluconate dehydrogenaseGnd (gnd) , UDP-Glc- 6 -dehydrogenase 
bad ■ (uqd) ■ ^nd WbnF (wbnF)aeneB , complete rds: and chain length determinan ^ 



ORF Name 


NTID AAID 


^ Score 
Length Length 


Probability 


S5.^.0^ii2...±2,.1^.4......... 


92£ 6148 


152 335 


3.4e~40 


Protein name 




Locus Name 


Acc# 






gp:AB017bU8 


AB017508 


Description 








Bacillus naxoaurans 


C-125 genomic L)NA, 


32 kb fragment, . compie 


teccis . j 


ORF Name 


NTID AAID 


^ ^ Score 
Length Length - 


Probability 


sanimjti^A 


$27 5149 


157 474 614 


7 .6e-60 


Protein name 




Locus Name 


Acc# 


| unKnown 




| gp:AP04874y 


AF048749 



Description 



Bacteroides tragilis capsular p olysaccharide biosyntnesis operon, complete 
sequence. 



309 



ORF Name 



NT ID 



NT — score Probability 

AAID Length Length 



[S.4e-45~ 



Protein name 



sensory transduction histiame Kinase 
slr2098:protein slr2098 :protein slr2098 



Locus Name 
jpir:dVblJT3 



Acc# 
S75130 



Description 



ORF Name 



Protein name 



NTID 



g.5.?.ab.b^cL^8. | p? 



AAID 



— — Score Probability 

Length Length 



[2Tu" 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



6.s.7.lB.y.:/...±i....2i^. 



13-8 8 i> 



6.5e-i^ 



Protein name 

"putative aipna-glucosidase 



Locus Name 



j |gp:AAC^b2l^r 



Acc# 



AJ252161 



Description 



Alicyclobaciiius acidocaldanus maltose/ma 
(malEFGR genes, cdaA gene and glcA gene) . 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length 



\6£.S.l^.±L.±y. I 



| 1.4e-bri 



Protein name 



Locus Name 



IspiYt'CBjiKJoLi 



Acc# 
P36979 



Description 

HYPOTHETICAL 4a. 1 KB kROTUlN IN NDR-GOi^ ItlTE k^lO klftJldd 



310 



ORF Name 



NT ID 



NT AA 

, , _ — , — Score Probability 
AAID Length Length JL 



6767S37 tl 13 



7F" 



Protein name 

Description 
IMO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NT ID 



AAID 



6MA16.2...al..A10. I TO 



Length Length 



Score Probability 



"TUT 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



u 



ORF Name 



NT ID 



AAID 



NT AA 

— , — 1 . Score Probability 
Length Length 



mafti&..±i...aa i to 



Protein name 

Description 
MO-HIT- 



Locus Name 



Acc# 



h us? 



NT 



AA 



ORF Name 



NT ID 



AAID 



1$£&1S...±1..±S£ I TO 



Length Length 



Score Probability 



Protein name 

Description 
INO-HiT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



a&m&.7....c3L..A7.a.. 



Length Length 



— , Score Probability 



T92 



Protein name 
Description 

m^rrr 



Locus Name 



Acc# 



311 



ORF Name 



9513 cl 473 



Protein name 



NT ID 



NT 



AA 



AAID Length Length 




Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NT ID 



AAID 



conserved Hypothetical protein yKgB 



Description 



— — Score P robability 
Length Length 



WTT 



5.7e-55. 



Locus Name 



pir :D698b6 



Acc# 



D69856 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



T757" 



T5" 



T703T" 



Protein name 



Locus Name 



unknown 



|gp:U^677I 



Acc# 



U96771 



Description 



ftrevotella bryant ii putative polygalacturonase, b-i, 4-enaogiucanase, ana 
mannanase genes, complete cds ; and unknowngenes . 



ORF Name 



NTID 



— — , Score Probability 
AAID Length Length 



±&lS2:ibA.±2...12 



MIT 



TJT 



TIT 



Protein name 



Locus Name 



IgA Fc ■ receptor-like protein A428L 



bir-.TlVsm 



Acc# 



T17931 



Description 



312 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


I2787768_i3J>b 


■ |941 


616^ 






Protein name 








Locus Name 


Acc# 


Description 














NO-HIT 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 
" 2.2e~06 


ll&Sl±M„c±J&± 


942 


j |6i64 


2F6 | |891 l^ 




Protein name 








Locus Name 


Acc# 










sp:VIRF_VJslkLiJN 


P13225 


Description 














^/lOTLEWCE REgIUlojN 


TTO^SCRIPTIONAL ACTIVATOR 








ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 
1.5e-i8 


145.ilUB.b....al...ab. 


543 


6165 


330 993 z±a 




Protein name 


Locus Name 


Acc# 


[.hypothetical protein FMry. 


5 






T33774 


Description 












ORF Name 


NTID 


AAID 


NT 
Length 


— , Score 
Length 


Probability 


i&aa^ttbL^ti^fixd 


.... 944 


6166 


431 1296 x/zi 


" 2.3e-ivv 


Protein name 








Locus Name 


ACC# 


| hypothetical prot 


em 






| pir:J01020 


JQ1020 



Description 



313 



• 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



FT" 



0.031 



Protein name 



Locus Name 



Acc# 



P36378 



Description 

(QgTBOtlJjei'lM) (ON) IkASEMfeMT MhiMfi ftANB PftO'l'IjlN feM-4u) 



NT 



AA 



ORF Name 



NTID 



AAID 



2346^6yi c3_lil 



6168 



Length Length 
T7W 



Score Probability 



HTO" 



Protein name 
Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NTID 



AAID 



NT 
Length 



AA 
Length 



Score Probability 



92 



Protein name 
Description 



Locus Name 



Acc# 



MO -HIT 



ORF Name NTID 


AAID 


NT AA 
Length Length 


Score 


Pi 


-obability 


119M±o:^.qLA1 948 


6170 | 


711 2136 




S.$e-44 


Protein name 




Locus 


Name 




Acc# 
AJ130872 




| receptor antigen (KagA) 




| gp:PGI130872 






Description 














Porphyromonas gmgivalis WbO 
immunodominant 55kDa antigen. 


receptor 


antigen (rag; 10 


:us enc 


ocLi 


nga major 





314 



ORF Name 



12464067b c2 99 



Protein name 



NT ID 



FT7T 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



NO -HIT" 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




::. 950 


6112 


267 


804 






Protein name 








LOCUS 


Name 


Acc# 


Description 














NO-HIT 












i 


ORF Name 


NT ID 


AAID 


NT AA 
Length Length 


Score 


Probability 




951 


617:1 


415 


1248 


162 


"| 2.3e-17 


Protein name 








Locus 


Name. 


Acc# 


1 receptor antigen ikagAj 






| gp:PGI130872 


AJ130872 


Description 




Porphyromonas 
immunodominant 


gingivalis Wbu 
55kDa antigen. 


receptor 


antigen (rag) loc 


:us enc 






ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 
8.0e-l7 


16.16.0.6.1^1^.... 


952 


6174 


201 


606 






Protein name 








LOCUS 


Name 


Acc# 



gp:AHUb6«jJ2 



U56832 



Description 

Aeromonas hydrophila PKbOb bindi ng protein (t*pAj gene, compiececas m j . * 
kb fragment . _ 



315 



if! 



V, tii! 1 



4 ssi' 



ORF Name 



NT ID 



~ — Score Probability 

AAID Length Length 



|28i^yi2 c3 110 



1517b 



[46 0 



1 l 1 - 5e " 4 ^ 



Protein name 



Locus Name 



IsptYHAMJilL'uLl 



Acc# 



P42626 




Protein name 
KIAA0tfV9 protein 



Description 



Homo sapiens 



mRNA tor KIAAOtWiJ protein , complete cds 




ORF Name 



NT IP AAID Length Length 



"oTTT 



[ST* 



— , Score Probability 
0.020 



5T" 



Protein name 



Description 



Locus Name 
gp:AFSCK 



Acc# 



X70080 



A.tranciscana Scr gene (homologue oi Drosophila sex comes reduced; 



ORF Name 



NT ID 



AAID Length Length 

— 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



316 




ORF Name NT ID 


NT 

AAID Length 


AA 
Length 


Score 


Probability 


t3T , "54 ^ ^7? TT8 


1137 


388 




Protein name 




Locus Name 


Acc# 


hypothetical protein 




^| gp:ATHl3274h 


AJ132745 


Description 








1 


Arabidopsis thaiiana hypotnetical protein, c 


lone EMc^y 




1 


ORF Name NT ID 


NT AA 
AAID Length Length 


Score 


Probability 


?"fi04562_c2_103 958 


Sl80 452 


1359 


156 


2.6e-ll 


Protein name 




Locus Name 


Acc# 


1 putative outer membrane porm 




1 |gp:AP0ioy77 




Description 




1 Vibrio cholerae giutamyl tkMA synthetase (gitX) gene, partial cas;puuaLiv B 
outer membrane porin (ompA) , unknown protein, vibriobactinreceptor precursor 
(viuA), and ViuB protein (viuB) genes, completecds; and VibF (vibF) gene, 
partial cds. 




ORF Name NT ID 


NT AA Score 
AAID Length Length 


Probability 


4&7.5fli5...ci...aa 95 s | 


5181 193 


582 


IbU 


7.4e-14 


Protein name 




Locus Name 


Acc# 


pENA polymerase sigma tactor sigz-iuce protein | gP :AKi3726i 


AF137263 


Description 

n ■ i J TTTTTZTZT^. 




Sacteroides thetaiotaomicron iOS ribosomal protein bio-iiKeprc 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein 
complete cds. 


(sigZ) genes, 




ORF Name NT ID 


NT AA 
— — , Score 
AAID Length Length 


Probability 


ISAlbll^cx^ 960 


S1S2 377 


1131 




1.4e-iy 


Protein name 




Locus Name 


Acc# 






gp:AF0^424 


AF083424 


Description 











Ateline herpesvirus 3 complete genome. 



317 



• 



NT 



AA 



ORF Name 



NTID 



AAID 



535ib07 ±2 Jy 



16183 



Length Length 
TTT7 — 



Score Probability 



T7W 



Protein name 
Description 



Locus Name 



Acc# 



MO- HIT 



ORF Name NTID AAID Length Length 


Score 


Probability 
1.3e-07 


EMIS..7.7Zcl...i0.7. |562 6184 | 352 


1059 


147 




Protein name 


Locus Name 


Acc# 


1 transmembrane sensor 


'] gp:AF0bib9l 


AF051691 


Description 




1 Pseudomonas aeruginosa stress tacbor A (pstA) , &w sigma tactor inui ; , T 
transmembrane sensor (f iuR) , and hydroxamate-typef errisiderophore receptor 




(fiuA) genes, complete cds. 










— — - — Score 
orF Name nttd AAID Lenqth Length 


Probability 


scmia5...c3L...ia& %j 6 is 5 824 


2475 


zuy 


1.2e-13 



Protein name 



Locus Name 



serine/ threonine protein Kinase related 
protein 



pir:H6 9-064 



Acc# 



H69064 



Description 









NT 


AA 

— Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 


10.5.40.b.i....a^...liy. 


964 


6l$6 


297 


$54 125 


0 ..0(5030 



Protein name 

115K outer membrane protein precursor : susd 
protein 



Locus Name 



pir: 



Acc# 



JC6027 



Description 



318 



ORF Name 



10742^2 cl 106 



Protein name 



NTID 



^ — score Probability 

AAID Length Length — 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length 



\m&i&.lv±j&~al i 



1 11980 



[T3Tu~ 



Ii.0e-i3b 



Protein name 



Locus Name 



sp:YFl(J_BA<JtW 



Acc# 
P54719 



Description 

HYPOTHETICAL AkC TkAtiSPOftTisiK A ' l'^-BlNDiNCi PkO TEllN 2 1M GLVBC j'K^luU 



"is; ;!!■-" 



1:3 



NT 



ORF Name 



NTID 



AAID 



Length Length 



— Score Probability 



F4 



Protein name 
Description 



Locus Name 



Acc# 



MO -HIT" 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


ibRtkitei.±jkJJj± 


968 1 6l90 


574 


1725 


1246 


8.1e-i2/ 


Protein name 






Locus Name 


Acc# 


I ABC transporter, 


" ATP -binding protein 




"] pIr:E72336 


E72396 


Description 



319 



NT 



ORF Name 



NT ID 



AAID Length Length 



AA 

— Score Probability 



2356280:2 c3 144 



TIT 



1.0e-55 



Protein name 



Locus Name 



sp:SBCD_kUoc!A 



Acc# 



068033 



Description 



ORF Name 



NT ID 



AAID 



— — Score Probability 

Length Length 



24SSibb7 c'A lib 



TT¥T 



Protein name 



Locus Name 



fibronecton type III 



|gp:H0Ml? ' d3A 



Acc# 



MX2549 



Description 



Human tibronectin gene type III Homology unit corresponamg to 
thecell -binding domain, exons 6 and 7. 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


ll6£6M.h.^lJL±b. 


571 5153 


555 


2551 


454 


l.Be-B4 



Protein name 



Locus Name 



probable exonuciease, 



pir:T0345S 



Acc# 



T03465 



Description 



ORF Name 



Protein name 



NTID 



— — Score Probability 



AAID Length Length 



98 



Locus Name 



Acc# 



Description 



MO -HIT 



320 



ORF Name 


NT ID AAID 


NT AA 

^ N ,.r.. — Score 

Length Length 


Probability 
2.7e~ll 




pi ST^S 


JZ3 1092 180 




Protein name 




Locus Name 


Acc# 


cation ettiux 


system tczcb-liKe) 


| pir:C7041b 


C70415 


Description 


ORF Name 


NT ID AAID 


NT Score 
Length Length ■ 


Probability 




S74 | 5155 


195 " 588 




Protein name 




Locus Name 


Acc# 


Description 








NO-HIT 1 


ORF Name 


NT ID AAID 


NT AA Score 
Length Length 


Probability 
2 . 4e-^4 




:.....| 975 6197 


345 ' |i038 2"/5 




Protein name 




Locus Name 


Acc# 


| hypothetical protein 'ml693 




G72223 


Description 


ORF Name 


NT ID AAID 


NT AA 

— Score 
Length Length 


Probability 


3.i5..mi...±a...:Z 


976 | blVH 


' 33G~ 1173 3TT5 


4.2e»27 


Protein name 




Locus Name 


ACC# 


["probable phospnoesterase , yKun; 


j pir:B69^6b 


B69865 



Description 



321 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



34173431 ±1 b 



T3TT 



T3T" 



3.6e-14 



Protein name 



SigX 



Locus Name 
I |gp:AFllb334 



Acc# 



Description 

fseudomonas tluorescens PpsA (ppsA) g ene, partial cas; esua vestxj , M eno 



(menG), CmaX (cmaX) , CrfX (crfX) , CmpX (cmpX) , SigX (sxgX),OprF (oprF) , and 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


3466l30l_cl_l02 


97 8 


6206 


1683 325-2 


354 


6.3e-53 



Protein name 



Locus Name 



acr 



if lavme resista nce protein jacrBj nomoiog j jpir :D7UiT7 



Acc# 



D70117 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 
7.4e-33 


3.S.3.S.2Ib....c^...ll^ 


979 


6201 


550 




1653 


384 





Protein name 

cation ef ilux ( Ac r 33 / Ac r b / Ac r F ramiiyj 



Locus Name 



1 jpir:g703bff" 



Acc# 



F70368 



Description 



ORF Name 



Protein name 



Description 
[NO-HIT 



NTID 



AAID 



±19AL±'^±1J11 



NT 

Length Length 



AA 

— Score 



Locus Name 



Probability 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



4805286 ci yy" 



6203 



Probability 
|2.7e-50 



Protein name 



Locus Name 



acn 



tiavxne resistance protein ^acr*) nomoxog j ipir :U /uxi / 



Acc# 



D70117 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


5ia£ii:L.ci...im 


982 




450 


1293 


110 


0.0047 


Protein name 








LOCUS 


Name 


Acc# 










sp:YD40 


JIM! IN 


P44165 


Description 














"HYPOTHETICAL PkOTEIN Hii34 0 | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


&&5.1&±b....aL.±±b.. 


983 


6205 


161 


486 






Protein name 








Locus 


Name 


Acc# 


Description 














MO-HI'i' ' 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


i0.21^B.:A7....c2...2b.U 


984 


6206 


389 


1170 


2 007 


1.8e-207 



Protein name 



Locus Name 



Acc# 



putative epimerase/aenyciratase 



gp:A^125l64 



AF125164 



Description 

Bacteroides rragilis 6J8k polysa ccharide b (PS bz } Diosyntnesisiocus , 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



NT ID 



AAID 



Length Length 



Score Probability 



Protein name 

Description 
IN^TTTT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



±o£Aaa±.±±..A& j ms 



Length Length 



Score Probability 
TST 



b.0e-23 



Protein name 



Locus Name 



hypothetical protein Rv2731 



pir:&7050S 



Acc# 



B70506 



Description 



NT 



AA 



ORF Name 



NTID 



W5T 



AAID Length Length 



TTT 



Score Probability 
125 



5.0e-08 



Protein name 



Locus Name 



Acc# 



HipA protein. 



gp:D90794 



Description 



E.coli genomic DNA, Kohara clone #303(34.3-34.6 min. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



iftft&afli3....ci...2aa ...j 



zrur 



TUJT 



Protein name 



Locus Name 



putative epimerase/dehydratase 



gp:AFl25l64 



Acc# 



AF125164 



Description 



Bacteroides tragilis 638R polysaccharide B {PS B2) biosynthesislocus, 
complete sequence; and unknown genes. 



324 



NT 



AA 



ORF Name 



NT ID 



110234^2 cl 2Ub 



AAID Length Length 
TItt 



3TT 



Score Probability 
|2.7e-215 



Protein name 



Locus Name 



putative glycosyitransrerase 



gp:ARl2blb4 



Acc# 



AF1.25164 



Description 



Bacteroides tragil is 6iBk poiysaccnariae b ips B2) bios 
complete sequence; and unknown genes. 



biosynthesisiocus , 



NT 



AA 



ORF Name 



NTID 



ll§39bl 12 hi 



AAID Length Length 
T53 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



IN0-H1T 



NT 



AA 



ORF Name 



NTID 



±2±&1&X±±± 1ZZ LL ] .L 



991 



AAID Length Length 




Score Probability 
55 



0.031 



Protein name 



Locus Name 



cell cycle progr ession restoration 8 protein | igp;AFUii7y4 



Acc# 



AF011794 



Description 



Homo sapiens cell cycle progression restoration a protein ( 
complete cds . 



NT 



— Score Probability 



ORF Name 



NTID 



AAID Length Length 



F2TT 



FT 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 




npi? KT^m^ NTID AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


13804I87_tl_47 552 6215 


56 


257 84 


0.0018 




Protein name 




Locus Name 


Acc# 


1 


hypothetical protein 




] gp:MTH24J6S6 


AJ243656 


Description 






Methanobacterium tnermoautotropmcum enD^a, a, 
VI, N, 0, P, Q, & ORFS 1,2 & 3. 


C, D, E, f, U, 






OFF Name NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


T323'0637J:3_147 554 |6216 


387 


1164 






ST J- V-^ l»- C -L J. J. X±d,lllv5- 




Locus Name 


Acc# 




Description 










MO-HIT 










ORF Name NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




I4^.S.45.0...±3....116. 555 6217 


121 


366 100 


■ 2.2e»05 


i! :u;i' 

# !!!!!. 

ii 1 :i 
ss- 


Protein name 




Locus Name 


Acc# 




hypothetical protein tiyll3 3U 






F72267 


Sif fifi; 


Description 








H 

IK! ffii:; 


ORF Name NTID AAID 


NT 
Length 


AA 

— n Score 
Length 


Probability 




T±215.15.2..±±.±6:l 555 6218 


681 


2046 1133 


7.6e-iib 




Protein name 




Locus Name 


ACC# 




| (pjppGpp synthetase 




"| gp:6StJ86i77 


U86377 




Description 












Bacillus subtil xs (p)pp<Jpp synthetase 
adeninephosphoribosyl transferase (apt) 


(relA) 
genes, 


and 

complete cds . 







326 



ORF Name 



114648^0 11JL8 



Protein name 



Description 



Length Length 
1 1888 I 



Score Probability 



Locus Name 



ORF Name 



NT ID 



33F 



Protein name 



AAID 



"hypothetical protein vrvuzzbe 



Description 



ORF Name 



NT ID 



AAID 



VTZT 



Protein name 



ybeB protein homolog 103 ap :procein 
slr!886 :protein slrl886 



Description 

ORF Name 
Protein name 



NT — Score Probability 

ngth — 

TZ1 11^5 I f^" 



0.00067 



Locus Name 
] jpir:K7ib2U~ | 



Acc# 



E71620 



NT ^ Score Probability 

Length Length - - 



|4.3e-I« 



Locus Name 



bir:S7714b 



Zl 



Acc# 



S77145 



AA 

Length Length 

P73 1 1235 I 



Score Probability 



Locus Name 



Description 



# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1570967b ci iyi 



TUUT 



6223" 



TTuT" 



7W 



2.4e-79 



Protein name 



Locus Name 



tsp:m8_MUCrU 



Acc# 



P71777 



Description 

HY&Ol'HBTicAL 36.3 KB EftoTmti 0^ 7 7 .18 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


l57$0675_cl_19y 


1002 


6224 


401 


1206 


862 


4.0e-§6 



Protein name 

phosphonopyruvate decarboxylase, rom2 



Locus Name 



pir :3602l2 



Acc# 



S60212 



Description 









NT 


AA 


Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 






lS.5.1146.^Cl».2.7.b; 


1003 


622b 


419 


1260 


869 




7.2e-87 



Protein name 



Locus Name 



Acc# 



gp:VBlxiJ*JuLl 



Description 

HYPOTHETICAL! 46 . fa KB PkuTLllN IN LNTLlkCjEflic KauiuN 



NT 



AA 



ORF Name 



NTID 



AAID 



TUUT 



6226 



— — , Score 
Length Length 

fZTW 



840 



Probability 
1.2e-7o 



Protein name 



Description 



Locus Name 



sp:SOJ_BA(JaU 



Acc# 



P37522 



50J kkuTElN" 



328 



• 



NT 



AA 



ORF Name 



NT ID 



AAIP Length Length 

rrs — 



WTTf 



Score Probability 
T0T6 



l.le-106 



Protein name 

putative unaecaprenyi-pnosphabe 



Locus Name 



1 [gp:AF12blb "T" 



Acc# 



AF125164 



Description 

Bacteroides tragilxs polys accharide B (PS xz) Diosyntnesisiocus, 

complete sequence; and unknown genes. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


226^642_c2__>i4y 


1006 | 


622S 




1068 465 


4 . 7e-44 


Protein name 








Locus Name 


Acc# 



putative giycosyitzransrerase 



|gp:AF12Sl£T" 



Description 

Bacteroides tragiiis feirik polys accharide B t*2) biosyntnesisiocu^ 

complete sequence; and unknown genes. 







NT 


AA 

— Score 


Probability 


ORF Name 


NTID AAID 


Length 


Length 


2.3A3.0.a7.6....cd...ZB.y. 


1007 6229 


BOS 


1527 195 


3.4e-ia 



Protein name 



Locus Name 



putative ruppase 



~] | gp;AmblFT 



ACC# 



AF125164 



Description 



Bacteroides tragiiis 63SR poly saccharide B IPS xz) biosyntnesisiocus, 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



NTID 



AAID 



IMF 



Length Length 
TTJTS — 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 




ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


235545bb_t3_14iS 


1009 


6231 


254 








Protein name 










Locus Name 


Acc# 


Description 
















MO-HIT 1 


ORF Name 


NT ID 


AAID 


NT 
Length 




AA 
Length 


Score 


Probability 


\115^±±\L±1..±±^ 


1010 




119 








Protein name 










Locus Name 


Acc# 


Description 
















NO-HIT 1 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


23.&3.5.a5.2...Gl...lk2. 


1011 


6233 


333 


1002 




6.7e-ll | : 


Protein name 










Locus Name 


Acc# 


| dolichol-P-giucose 


syntnetase nomolog 




I 


pir:Eby3^^ 


E69322 


Description 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 

— „ Score 
Length 


Probability 
1.4e-140 


2163£2b:^a2Jl± l ± 


1012 


6234 


444 


1335 


ij /b 




Protein name 










Locus Name 


Acc# 


1 phosphoenolpyruvate phospnomutase FOM1 


i 


pir:^60^0b 




Description 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 
1.7e-37 


ll&.l&&l...al...l±b. 


1013 


" 6235 


383 


1152 


4U3 




Protein name 


Locus Name 


Acc# 


hypothetical prole 


m 








pir:S76344 


S76344 



Description 



330 



NT 



AA 



ORF Name 



NTID 



124017687 t'i lb2 



TDTF 



AAID Length Length 




Score Probability 
5.4e-27 



KM 



Protein name 

Description 
SYNTHASE) 



Locus Name 



Acc# 



|sp:CD^A__HAElN 



NT 



AA 



ORF Name 



NTID 



.2422656? c2 241 



TuTT 



AAID Length Length 
545 



^T7 



Score Probability 
|l.le-26 



TOT" 



Protein name 



Locus Name 



activator protein 



bp:AP04?627 



Acc# 



AF04 752 7 



Description 

Pseudomonas tluor escens activator protein imtiKj gene, compietecas. 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
524 



Score Probability 



TTTT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



NTID 



AAID 



NT 
Length 



AA 
Length 



— — Score Probability 



2.i411b.:/.7...±l...M 



TTTTT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



331 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



TuTT 



6240 



3.8e-06 



Protein name 



Locus Name 



galactosyltransterase nomoiog 



pir:G6946b 



Acc# 



G69465 



Description 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length 



16241 



2 . oe-ll 



Protein name 



Locus Name 



capsular poiysaccnancte niosynthsis protein [pir :F70441 



Acc# 



F70441 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



STB" 



Score Probability 
0.011 



Protein name 



Locus Name 



prooaoie memorane protein 
YOL019w: hypothetical protein 02313 



bir:S66701 



ACC# 



S66701 



Description 



ORF Name 



NTID 



TUTT 



Protein name 



NT 



AA 



AAID Length Length 



Score Probability 



J7T 



TTTe^TS" 



Locus Name 



Acc# 



029973 



Description 
HYPOTHETICAL PkoTEI N AV02bfe 



NT 



AA 



ORF Name 



NTID 



TUTT 



AAID Length Length 



Score Probability 



6244 



Protexn name 



Description 



Locus Name 



Acc# 



IN0-H1T 



NT 



AA 



ORF Name 



NTID 



AAID 



TUZT 



Length Length 

srn — 



Score 



Probability 
6.7e-52. 



Protein name 



Locus Name 



Acc# 



P31857 



Description 
HYPOTHETICAL $2 A KB frftCfftild itl cilbB 



7 



NT 



AA 



ORF Name 



NTID 



AAID 



1024 



Length Length 



Score 



Probability 
1.7e~l7 



Protein name 



Description 



Locus Name 



sp:Y66b_UAUlM 



Acc# 



P44033 



HYPOTHETICAL PkOTkllN H1066b 



□ 



ORF Name 



NTID 



AAID 



NT AA 
— — n Score 
Length Length 



1443 



Protein name 



Description 



Locus Name 



Probability 



Acc# 



NO-HIT 



333 



ORF Name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length 



33376906 c3 2$0 



SSI" 



T5T 



5 . 6e-32 



Protein name 



Locus Name 



LlCDl 



IgprAPlObbJy 



Acc# 



AF106539 



Description 



Streptococcus pneumoniae LicDl (licDi) and LicD2 {iicD'A) genes , complete 
cds; and unknown gene. 





ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Pr 


obability 


3340«67J:2_&2 


1027 


624$ 


$25 


277S 


12$ 




6.1e-Q5 



Protein name 



Locus Name 



115K outer membrane protein precursor : Suscj 
protein 



pir : 7 



Description 



Acc# 



JC6027 



NT 



AA 



ORF Name 



NTID 



3A&3.M&.1...C.1...ZUO.., 



AAID Length Length 



1017 



Score Probability 
i.2e-31 



Protein name 



Locus Name 



putative alcohol denydrogenase 



gp:C2A3^2 



Acc# 



AL078635 



Description 
Amycolatopsis orientalis cosmicl pcZA3 82. 



NT 



AA 



ORF Name 



NTID 



3.5.3.9.5.S..7.fo....a3....2.y.b.., 



TUTT 



AAID Length Length 




Score Probability 
ll.4e-2ll 



Protein name 



Locus Name 



Acc# 



putative epimerase 



gp:AF , 125164 



AF125164 



Description 

Bacteroides tragilis 63SR polysaccharide u biosynthesisiocus , 

complete sequence; and unknown genes. 



334 



ORF Name 



NT ID 



NT AA 

— , — , Score Probability 
AAID Length Length J ~ 



35401627 C3 288 



1030 



T¥T" 



l.^e-40 



Protein name 



Locus Name 



WcgF 



|gp:AP125164 



Acc# 
AF125164 



Description 



Bacteroides tragilis 638R polysaccharide B (PS B2) biosynthesislocus , 
complete sequence; and unknown genes. 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length JL 



cl 207 



2 .2e~2S 



Protein name 



Description 



Locus Name 



gp:AB0O§550 



Acc# 



AB008550 



Pseudomonas aeruginosa phage phi CTX, complete genome sequence. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



iaii4fl2L,c3L..itt7. I \nn7 



TFT" 



5 . 8e-14 



Protein name 



Locus Name 



unJcnown 



gp:AF125I64 



Acc# 



AF125164 



Description 



Bacteroides tragilis 638R polysaccharide B (PS B2) biosynthesislocus, 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



3.243..7.5.3....C2...245.. 



NTID AAID Length Length 




TTHT 



Score Probability 
TT73 



3 ,3e-130 



Protein name 



Locus Name 



glucose- 1 -phosphate thymidyltransr erase 



|gp:AF1251£4 



Acc# 



AF125164 



Description 



Bacteroides tragilis 638R polysaccharide B (PS B2) biosynthesislocus, 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



3555062 c3 299 



7W 



4.6e-S>2 



Protein name 



Locus Name 



unknown 



gp:AF125164 



Acc# 



AF125164 



Description 



Bacteroides fragi iis 6i&k polysaccharide H (PS B2) Joiosyntnesisiocus , 
complete sequence; and unknown genes. 



ORF Name 



NT AA 

— — • Score Probabil ity 
NTID AAID Length Length 



3991300 c3 2b8 



l.le-37 



Protein name 



Locus Name 



stationary pnase survival protein surh; 



|pir:A70372 



Acc# 



A70372 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



0.00012 



Protein name 



Locus Name 



unknown 



gp:AF04874y 



Acc# 



AF048749 



Description 



Bacteroides rragilis capsular poiysaccnaride mosyntnesis operon, complete, 
sequence. 



ORF Name 



NTID 



NT AA 

— " — Score Pro bability 
AAID Length Length 



\A±lS2B.b..±l..±b.L. 



TUJT 



4.3e-148 



Protein name 



Locus Name 



FtsH2 



gp:AB023310 



Acc# 



AB023310 



Description 

Cyanidioschyzon meroiae gene ror Jftsh^, complete cds. 



336 



NT 



AA 



ORF Name 



4304812 C2 246 



NTID AAID Length Length 

\&zm — 



[ITO 1 POT 



Score Probability 
3.7e-5i 



Protein name 



Locus Name 



WcgG 



|gp:AF125164 



Acc# 



AF125164 



Description 



Bacteroides rragiiis 638R polysaccharxcie B (PS B2) biosynthesislocus , 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



c3 257 



TTTTT 



2.5e-l00 



Protein name 



Locus Name 



putative acetyltransterase 



gp:AF12S164 



Acc# 



AF125164 



Description 



Bacteroicies tragiiis 638R poiysaccnarxae B (PS B2) JDiosyntnesisiocus , 
complete sequence ; and unknown genes . 



NT 



AA 



ORF Name 



\&B3£2TL.z±..±&2.. 



NTID AAID Length Length 

£T3 



6262 



TOT" 



Score Probability 
0.0062 ~ 



Protein name 



Description 



Locus Name 



gp:YP102KB 



Acc# 



AL031866 



Yersinia pestis 102 kbases unstable region: rrom 1 to 119443 , 



NT 



AA 



ORF Name 



4;S.9..7.12.S....al...2.0.1.. 



NTID AAID Length Length 

6265 



1041 



Score Probability 
l.Se-26 — 



Protein name 



Locus Name 



N-acetylglucosammyl trans i erase 



gp:A±*0l7355 



Acc# 



AB017355 



Description 



Streptococcus agalactiae dna, cps (capsular polysaccharide) genes , partial 
and complete cds. 



337 



• 



NT 



AA 



ORF Name 



NT ID 



AAID 



Length Length 

rupn — 



Score Probability 
|7.4e-117 



Protein name 



Locus Name 



Acc# 



X-His dipeptictase, :aniinoacyinistiame 
dipeptidase : aminopeptidase 
D:beta- alanyl-histidine 



pir : JU0300 



Description 



NT 



AA 



ORF Name 



NT ID 



4$527£0 c2 233 



AAID Length Length 



Score Probability 



6.8e~20 



Protein name 



Locus Name 



hypotneticai protexn 



|pir:E723lO 



Acc# 



E7231.0 



Description 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 0 
— ■ , Score 
Length 


Probability 


5±16Al±...c±Jl&<i.... 


1044 


6266 


362 


1089 178 


5.4e-ll .,. 



Protein name 



Locus Name 



capsular poiysaccnaricte mosyntnesis homoxog 
yveQ 



pir :F70U3b 



Acc# 



F70036 



Description 



NT 



AA 



ORF Name 



NT ID 



[Torr 



AAID Length Length 

— 



Score Probability 
3.0e-2b ' 



Protein name 



Locus Name 



hypothetical protein APE2014 



bir:H72504 



Acc# 



H72504 



Description 



338 



ORF Name 



NTID 



5275281 tl 45 



Protein name 



probable membrane -bound lytic mure in 
transglycosylase D (dniR) 



Description 



NT 



AA 



AAID Length Length 



Score 



1323 



T75~ 



Locus Name 



pir:H71301 



Probability 
|I.3e-38 



Acc# 



H71301 



ORF Name 



Protein name 



NTID 



TuTT 



AAID 



NT AA 

— — Score Probability 
Length Length 



TTSTT 



Locus Name 



Acc# 



Description 
NO-HIT 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



fTTT 



Locus Name 



9.6e-r; 



Acc# 



Description 



sp : KSGAJWCCA 



P43038 



ORF Name 



&8.127.5.7...±3....1d<L 



Protein name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length 



ZTTT 



W7T 



Locus Name 



3.0e-65 



Acc# 



Ykok 



Description 



gp:AB013374 



AB013374 



Baciiius haiodurans e-125 mamX, yuciA, yKoK and yvtK genes, partiaiand 



complete cds . 



339 



NT 



AA 



ORF Name 



NTID 



AAID 



1050 



ZTTT 



Length Length 



Score Probability 
Tu~5T5 



Protein name 



Locus Name 



gp:AOPCZA361 



Acc# 



AJ223998 



Description 



Amycolatopsis orientalis cosmid PCZA361. 



NT 



AA 



ORF Name 



NTID 



c2 i^S 



AAID Length Length 




Score Probability 



TT¥TT 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 

— 



Score Probability 



l.le-200 



Protein name 



Locus Name 



putative aminotransferase 



|gp:Atfl25l64 



Acc# 



AF125164 



Description 



Bacteroides rragilis 638R polysaccnaride B (PS B2) Joiosynthesisiocus , 
complete sequence; and unknown genes. 



ORF Name 



NTID 



AAID 



NT AA „ , _ , _ , 

— , — , Score Probability 
Length Length 



aam2...cii,„m I irror 



MIT 



TT¥" 



Protein name 



Locus Name 



unknown 



gp:AF0S8902 



Acc# 



AF068902 



Description 



Streptococcus pneumoniae D-glutamic acid adding enzyme MurD 
(murD) , undecaprenyl-PP-MurNAc-pentapeptide-UDPGlcNAc GlcNAc 
transferase (murG) , cell division protein DivIB 
(divIB) , orotidine-5 1 -decarboxylase PyrF (pyrF) , and 

orotatephosphoribosyltransf erase PyrE (pyrE) genes, complete cds; andunknown 



340 



ORF Name 



9944428 13 97 



Protein name 



NTID 



TOST" 



AAID 



NT AA 

— — , Score Probability 
Length Length 



TulT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



ZTTT 



NT 



AA 



Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



I ri 



pi s 

5 tag 
1 3 

C3 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 
— 



F^7¥ 



Score Probability 
T3T" 



6 . 3e-06 



Locus Name 



receptor antigen (RagA) 



gpTroTTTD¥7T" 



Acc# 



AJ130872 



Description 



Porphyromonas gingival is W5 0 receptor antigen Tragi locus encodinga major 
immunodominant 55kDa antigen. 



fi 



NT 



AA 



ORF Name 



TUT7 



NTID AAID Length Length 
W?T5 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 

bjjo-hit 



341 



ORF Name 



16211277 13 28 



Protein name 



NT ID 



NT AA 
— — Sconce 
AAID Length Length 



6280 



¥77 



Locus Name 



Probability 



Acc# 



Description 
BTO-HIT 



ORF Name 



NTID 



±£AM1&1..±±...6. I 



Protein name 



NT 



AA 



T — ^ _ — Score Probability 
AAID Length Length — J - 

wzz — 



TBT" 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



NTID 



NT AA ^ n , _ , _ , fc 
_ — ■ T — Score Probabilxty 
AAID Length Length — — — — — ^~ 



2A6A19.1B....L1...5. 1 



1060 



TIT 



TTTT 



Protein name 



Locus Name 



muramoyl -pentapeptide carboxypeptidase 



pir :T34747 



Acc# 



T34 747 



Description 



ORF Name 



NTID 



NT AA 

— - , — , Score Probability 
AAID Length Length — — — J - 



2&5£&1±1..±1...1± 



Protein name 



Locus Name 



6>3e-I4 



Acc# 



slow myosin heavy chain 2 



|gp:GGU^5023 



Description 



U85023 



Gaiius gallus slow myosin heavy chain 2 (SM2) mRNA, partial cds . 



ORF Name 



Protein name 

Description 
NO-HIT 



NT 



AA 



NT ID 



AAID 



Length Length 



Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 



NTID 



AAID 



Hypothetical protein jhp0052 



Description 



NT 



AA 



Length Length 

\nrs~ 



Score Probability 
TIE 



Locus Name 



pir :F71980 



0.00017 



Acc# 



F71980 



ORF Name 



Protein name 

Description 
MO-HIT 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — JL 



TOW 



Locus Name 



Acc# 



ORF Name 



Protein name 

Description 
NO-HIT 



NTID 



NT AA 

' \^ T — T — ^, Score Probability 
AAID Length Length ~ ^ 



Locus Name 



Acc# 



343 



NT 



AA 



ORF Name 



NTID 



10755437 fl 11 



1055 



AAID Length Length 




TEW 



Score Probability 
£3T5 



1.9e-24 



Protein name 

Description 
BKt> OPERON TRAklSCftltTlONAL kEGULATOR 



Locus Name 



sp : BKDR PSEKJ 



Acc# 



P42179 



NT 



AA 



ORF Name 



NTID 



1175211 ±3 35 



AAID Length Length 

wzzs — 



Score Probability 
4 .6e-53 



Protein name 



Locus Name 



inner membrane ABC transporter 



gp:AP2l3S22 



Acc# 



AF213822 



Description 



Zymomonas moJoilis strain ZM4 tosmid clone 42B3, complete sequence. 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length 



122.7.2127...±1..A0.. 



Protein name 



Description 



Locus Name 



Acc# 



NO -HIT 



NT 



AA 



ORF Name 



±&21A11.±±..A I [TTO 



NTID AAID Length Length 

wiwi — 



1827 



Score Probability 
37T 



4 .4e-S7 



Protein name 



Locus Name 



gp:YP102KB 



Acc# 



AL031866 



Description 

Yersinia pestis 102 Jcbases unstable region: from 1 to 119443, 



344 



NT 



AA 



ORF Name 



NT ID 



1070 



AAID Length Length 

-^m — 



Score Probability 
T73 



: ^.6e-34 



Protein name 



Description 



Locus Name 



sp:VBDM_EC0LT 



Acc# 



P77174 



HYPOTHETICAL 23.9 KD PROTEIN IN CSTA-DSBG INTERGENIC REGION 



NT 



AA 



ORF Name 



NT ID 



15500317 tl 5 



TTT7T" 



AAID Length Length 
6293 



Score Probability 
7T5 



i.6e-27 



Protein name 



Locus Name 



NrpB 



gp:PMU4S46§ 



Acc# 



U46488 



Description 



Proteus mirabilis NrpS (nrpS) gene, partial cds, NrpU (nrpU) , NrpT(nrpT) , 
NrpA (nrpA) , NrpB (nrpB) , NrpG (nrpG) and IrpP (irpP) genes, complete cds. 



NT 



AA 



ORF Name 



2D.5.S.0.D.aD...±1...15..: I 



NTID AAID Length Length 

15273 — 



TIT 



JET 



Score Probability 
135 



|4.0e-08 



Protein name 



Locus Name 



6 0kDa protein 



gp:AB004560 



ACC# 



AB004560 



Description 

Porphyromonas gingival is DNA tor 6 0kDa protein, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



TUTT 



Length Length 



Score Probability 



T7T 



Protein name 

Description 
MO-Hlf 



Locus Name 



Acc# 



ORF Name 



122328450 tl 16 



Protein name 



NT ID 



1074 



AAID 



NT 



AA 



Length Length 



Score Probability 



TFT" 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 

Description 
HYPOTHK'TICAL 47, 



NT 



AA 



NT ID 



AAID 



T07F" 



Length Length 



3TT 



Score Probability 
HT7 



Locus Name 



|sp:YBDM_HCOLl 



Acc# 



P77216 



fi Kb PROTEIN IN CSTA-f)Sfi<3 IMTfiftgEMIC ftficSlOM 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length J ~ 



4ift&xaix...ci...aa .1 ituT^ 



0.0045 



Protein name 



Locus Name 



MHC class II alpha chain 



|gp:AF0$l557 



ACC# 



AF091557 



Description 



Aulonocara hansbaenschi MHC class II alpha chain MHC - Auha - DAAl 
mRNA ( MHC - Auha - DAA1 * 0 1 allele), complete cds . 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — 



a^43.i5.a:z...c2....7.s I rnrrr 



TUT 



TTT 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



AA 



ORF Name 



24452078 tl 1 



NTID AAID Length Length 
CTTJG 



Score Probability 



Tim" 



JUT 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2a5aaaa3.jta.Jia 


1074 


6301 




317 


954 


142 




l.Ve-07 



Protein name 



Locus Name 



pobR regulator 



lgp:P£EYiS527 



Acc# 



Y18527 



Description 



PseucLomonas sp. pobA, pobR, pcaQ, pcaH and pcaG genes . 



ORF Name 



NTID 



NT AA 
— — Score 
AAID Length Length 



7TT 



Probability 
|4.7e-l2 



Protein name 



Description 



Locus Name 



Acc# 



Y07639 



L.ivanovii 23S rRNA, 5S rRNA, tRNA-Asn, tRNA-Thr, ORF Z, mlD, 
genes . 



andmlC 



NT 



AA 



ORF Name 



NTID 



AAID 



m3.3.0.5.7....c.l...5.2. 



1081 



FIST" 



Length Length 



Score Probability 
i.5e-09 



Protein name 
Description 

THERMOREGULATORY ftfcOTHlN LCRF 



Locus Name 



sp:LCRF__YERPE 



Acc# 



P28808 



NT 



AA 



ORF Name 



NT ID 



AAID 



356S0462 13 43 



_ — ^, _ — Score Probability 
Length Length s ~ 

4.8e-22 



1082 6404 


200 


£03 




272 





Protein name 



"6 OKDa protein 



Locus Name 
|gp:AB004560 



Acc# 



AB004560 



Description 

Porphyromonas gmgivalis DNA tor 60JcDa protein, complete cds . 



NT 



AA 



ORF Name 



NT ID 



|406$1§0 ill 14 



„ „ „ — ^ — . Score Probability 
AAID Length Length — JL 

STUS 



T5T 



5ST 



Protein name 

Description 
WO-HIT 



Locus Name 



Acc# 



ORF Name 



NT AA 

. Tm „ ,, Tn T — ^, _ — _ Score Probabi lity 
NT ID AAID Length Length z - 



T01T3~ 



Protein name 



Locus Name 



Acc# 



lipase precursor 



bp:A?053006 



AF053006 



Description 

Staphylococcus epictermidxs lipase precursor (gehl) gene, completecds . 



NT 



AA 



ORF Name 



NTID 



AAID 



4y.5.^6.z...r.i....d I 11085 



Length Length 



Score Probability 
^ 



7.3e-23 



Protein name 

Description 
(EC! 2.1.1.-) 



Locus Name 



sp : TCMP_STRGA 



ACC# 



P39887 



348 



ORF Name 



NTID 



NT AA 

— n — Score Probability 
AAID Length Length JL 



5260317 c2 80 



15208 



TFT 



53" 



0.042 



Protein name 



Locus Name 



Acc# 



pqqG protein 



Description 



|pxr:B55b^7 



B55527 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length J - 



ST0T 



Protein name 



ITT 



81 



Locus Name 



0.0088 



Acc# 



hypothetical protein MTH1102 



Description 



pir :Fb9Q±3 



F69013 



ORF Name 



NTID 



NT AA „ ^ ^ 
— , — , Score Probability 
AAID Length Length 



mA120.2...al..±Ox I ITBTO 



Protein name 



5TT 



153%- 



\TTT 



Locus Name 



ACC# 



sensory transduction histidme kinase 
sll0474 :protein S110474 :protein S110474 



pir :S76650 



Description 



S76650 



NT 



AA 



ORF Name 



NTID 



±±122A±i...z1..ao. I lira? 



AAID Length Length 
STII 



Score Probability 
121 



Protein name 



Locus Name 



unknown 



|gp:U9677l 



Description 



2.8e-06 



Acc# 



U96771 



Prevotelia bryantii putative polygalacturonase, B-l, 4 -endoglucanase, and 
mannanase genes, complete cds; and unknowngenes . 



NT 



AA 



ORF Name 



NTID 



15501526 cl 



AAID Length Length 

mn — 



H089 



3270 



Score Probability 
7.4e-9i 



Protein name 



Locus Name 



receptor antigen (Rag A J 



fgpTPCTIlW7T" 



Acc# 



AJ130872 



Description 



Porphyromonas gmgivalis W50 receptor antigen (rag) locus encodinga major 
immunodominant 55kDa antigen. 



NT 



AA 



ORF Name 



NTID 



AAID 



TUWT 



WTTT 



Length Length 



Score Probability 
" 



Protein name 



Locus Name 



unknown 



gpTTTSTTTT 



Acc# 



U96771 



Descri ption 



Prevotella bryantii putative polygalacturonase, B-l, 4 -endoglucanase, ana 
mannanase genes, complete cds; and unknowngenes . 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



11&1%±11.±1...1\L„ |'[TU52 



TUT" 



0.024 



Protein name 



Locus Name 



hypothetical protein ytaP 



par :B69988 



Acc# 



B69988 



Description 



ORF Name 



2.4.^5.1.5.1. 2.*.. C.3.... 4.5... 



Protein name 



unknown 



Description 



NTID 



NT 



AA 



AAID Length Length 




TZTT 



Score Probability 

run — 



Locus Name 



gp:U36771 



5.0e.-07 



ACC# 



U96771 



Prevotella bryantaa putatave polygalacturonase, B-l , 4 -endoglucanase, and 
mannanase genes, complete cds; and unknowngenes. 



350 



ORF Name 



NT AA score P robability 

AAID Length Leng th 



Acc# 




ORF Name - — - 1 r r 5e-b(T 

I ■ ■ ■ ' LocusJ?ame ft cc i 

Protein name 

j ^hypotheLxcai pruLei.11 l AUI ^ 



NTID 



AAID Length Length 



C75064 




351 



ORF Name 



NTID 

mm 



NT AA score Probability 

AAID Length Length 



[3W 



1 



I9.2e-6B" 



Protein name 



Locus Name 
|gp:ijMkJ^AM5SEj 



Acc# 
D28493 




NTID AAID Length Length 
ORF Name £Li±H 



Score Probability 
ZT9 



Protein name 
"TruB 



Locus Name 



Acc# 



] [ gp:A^ib^b/ ~2 AF169967 



Description r j_ g d i poi^ U j^l*) 

NT score Probability 

AAID Length Length — 



ORF Name 



NTID 



^wgEDED EE 

— ■ ~ T.onus NatT 



i . 5e-u^ 



Protein name 



Locus Name 



ACC# 
P50069 




ORF Name 



NTID 



TToT" 



AAID Length Length 



[7T 



Protein name 
Description 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length jL 



14882902 cl 108 



1102 



1146 



TTT 



Protein name 



Locus Name 



sensory transduction system regulatory 
protein slrl837 :protein slrl837 iprotein 
slr!837 



Description 



pir:S77341 



0.00018 



Acc# 



S77341 



NT 



AA 



ORF Name 



NTID 



AAID 



1S105577 cl 115 



TTOT" 



Length Length 
T^> 



Score Probability 
o . 0011 



TT57T 



Protein name 



Description 



Locus Name 



sp:HEM4_tit!Hl>0 



Acc# 



P87214 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



2.0.1.7.6.8..7.8....C3....15.6.. 



1104 



[TuTT 



0.000S3 



Protein name 



Locus Name 



AT P a s e s ut>un i t 6 



gp:TCU40265 



Acc# 



U4 026 5 



Description 



Trypanosoma cruzi ATPase subunit 6 mRNA, complete ccts . 



NT 



AA 



ORF Name 



NTID 



2M.7..7.5....C3....15.1 1 ITTTC 



AAID Length Length 





Score Probability 
|3.7e-44 



Protein name 



Locus Name 



PtsX 



|gp:AF169967 



Acc# 



AF169967 



Description 



Flavobactenum ]onnsoniae LeuS (leuS) gene, partial cds; and Fjol2 [tjol'Z) , 
FtsX (ftsX), Fjol3 (f jol3) , BacA (bacA) , and TruB (truB)genes, complete cds. 



353 



NT 



AA 



ORF Name 



NTID 



^11603 7 cl 107 



1106 



AAID Length Length 
'ZTTZ 



Score Probability 



TTT 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



NT 



AA 



ORF Name 



NTID 



|2.1b^215.0....c2...141 1 [TTU7 



AAID Length Length 




WTT 



Score Probability 




1 . 4e-ll0 



Protein name 



Description 



Locus Name 



sp : METK__HAE IN 



Acc# 
P43762 



AbEMOSYLTRAMSPERASE ) ( ADOMET SYtfTtiEtASE ) 



NT 



AA 



ORF Name 



NTID 



AAID 



Z3A4;Z1.7.5....al...l22 1 11108 



Length Length 



Score Probability 
1071 



2.&e-10§ 



Protein name 



Description 



Locus Name 



sp:SYY_BACST 



Acc# 



P00952 



TYROS YL-TRNA SYNTHETASE, {TYROSINE- -TRNA LIGASE) (TYRRS) 



NT 



AA 



ORF Name 



NTID 



AAID 



23A4.7.Q3.:U.al...l0.9. I 11105 



Length Length 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



354 



ORF Name 



Protein name 



NT ID 



1110 



AAID 



NT AA 

— , — • , Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 

Description 
NO-HIT 



NT 



AA 



NT ID 



AAID 



TTTT" 



Length Length 
TTu" 



Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 



Description 



NT 



AA 



NT ID 



AAID 



1112 



6334 



Length Length 



Score Probability 



2031 



Locus Name 



Acc# 



[NO-HIT 



ORF Name 



2A4D.6.25.3....G3....:L&.3.. 



Protein name 



NT 



AA 



NTID 



AAID 



TTTT 



6335 



Length Length 
S3 - 



Score Probability 
i.5e~09 



TF2 



Locus Name 



oxiaoreductase , snort cnain 
dehydrogenase/reductase family 



|pir:A723yb 



Acc# 



A72395 



Description 



ORF Name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



26605287 ci 114 



Protein name 



TTT¥" 



6336 



Locus Name 



Acc# 



Description 



lsp:BACA_ECOLI 



{EC 2.7.1.66) 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



25505427 tl 14 



Protein name 



1115 



7TT 



0.053 



Locus Name 



Acc# 



hypothetical protein A6 35R 



Description 



pir:Tl3137 



T18137 



ORF Name 



NTID 



AAID 



NT AA 
t ™ f u t Score Probability 

Length Length 



2.5.5.3.2S.iS....al...lB.l.. 



Protein name 



TITS" 



633S 



1ST" 



Locus Name 



Acc# 



Description 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length 



11215A10....al..±6.± I (TTT7 



Protein name 



1584 



Locus Name 



3.5e-42 



Acc# 



choline sultatase 



gp:RMU35540 



U39940 



Description 

Sinorhizobium meinoti bet operon, complete sequence . 



ORF Name 



J42516J7 t'2 VU 



Protein name 



NT ID 



TTTF" 



AAID 



NT 



AA 



Length Length 
T73 



Score Probability 



5T7 



Locus Name 



Acc# 



Description 



1H0-H1T 



in 



ri 



NT 



AA 



ORF Name 



NT ID 



AAID 



3LS7.aiA16L...cl...lQ5 1 ITTT? 



Length Length 



Score 



Im- 



probability 
12 .4e-35 



Protein name 



Locus Name 



putative secreted beta-galactosidase 



Acc# 



AL133171 



Description 



Streptomyces coelicolor cosmid F81. 



ORF Name 



NT ID 



NT AA 
— — Score 
AAID Length Length 



T7B" 



Probability 
|2.5e-±3 



Protein name 



Locus Name 



T30TT 



bp:Afl6SS67 



Acc# 



AF169967 



Description 



FlavoJDacterium johnsoniae LeuS (leuSl gene, partial cds; and 
FtsX (ftsX), Fjol3 (fjol3), BacA (bacA) , and' TruB (truB)genes, 



Pjol2 (tjol'2) , 
complete cds , 



NT 



AA 



ORF Name 



NTID 



AAID 



\16All&&i...al..±±& 



1121 



Length Length 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



[NO-HIT 



357 



NT 



AA 



ORF Name 



NTID 



AAID 



3937750 cl iiO 



TTZT 



Length Length 




Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



ia42.flfl2...cli.„14fl 



TTT7T* 



l .4e-74 



Protein name 



Locus Name 



S-adenosylmetnionine tRNA risosyltransrerase I foir :A72360 



Acc# 



A7236 0 



Description 



NT 



AA 



ORF Name 



NTID AAID Length Length 




283 



Score Probability 
735 



l.ie-72 



Protein name 



Description 



Locus Name 



sp:KDU!IMi!kW(JH 



Acc# 



Q05529 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



TTT5~ 



WITT 



TTF 



i.6e-..li 



Protein name 



Locus Name 



HI04B4 



IgpTAFTTTT^T 



Acc# 



AF174390 



Description 

Haemophilus xntluenzae strain Rd KW20 HI0454 gene, partial cas. 



358 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



141^^4^^ c3 157 



1126 



7S~ 



2"2T 



Protein name 



Locus Name 



conserved hypothetical protein 



pir:G72251 



Acc# 



G72251 



Description 



ORF Name 



NTID 



NT AA 

— , — • Score Probability 
AAID Length Length wL ~ 



\±33.B3.B.2....C.X...Xm I 11127 



6349 



tzur 



2 .le-34 



Protein name 



Locus Name 



conserved hypothetical protein yvdD 



pir :D70033 



Acc# 



D70033 



Description 



r'i 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



4.7.S.7....c3....lSi I [XT2F 



T71T 



Protein name 

Description 
IMO-HtT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— - , — , Score Probability 
Length Length 



\&&6.11A2.±1„3A I fll^ 



6351 



TZTT 



TUT 



l.le-38 



Protein name 



Locus Name 



hypothetical protein C0624 



pir:S736$l 



Acc# 



S73091 



Description 



ORF Name 



Protein name 



NTID 



response regulator 



NT 



AA 



AAID Length Length 



T77T 



Score Probability 
i.7e-09 



T&2 



Locus Name 



Acc# 



gp:S£>AJ6398 



AJ006398 



Description 

Streptococcus pneumoniae rr09 and hk09 genes; two component system09. 



359 



NT 



AA 



ORF Name 



NTID 



AAID 



I4876B15 c2 114 



TUT 



Length Length 



Score Probability 



TUT 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 

TTT2 1 — 



Score Probability 



1 



Protein name 

Description 
pro -HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 
TU3 1 \61>SB I 1 [TWO — 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID 



1 [STSF 



Length Length 



Score Probability 



Protein name 
Description 

fraiT 



Locus Name 



Acc# 



ORF Name 



NT AA 

-71 -pys _ — _ ~ — _ Score Prob ability 
NTID AAID Length Length JL 



^llbl^l^llZ I [1135 I [6357 I [ST? 



Protein name 



Description 



Locus Name 



Acc# 



WO-HIT 



360 



ORF Name 



NTID 



55921^8 cl lub 



TTTS" 



Protein name 



AAID 



hypothetxcal protein PHU^tfi 



Description 



— — Score Probability 
Length Length 

4 . 7e-06 



TIT 



Locus Name 
1 [pir:DV14bl 



Acc# 



D71453 



ORF Name 



Protein name 
Description 



NT 



AA 



NTID 



TUT 



AAID Length Length 
TTTT 



Score Probability 



Locus Name 



Acc# 



MO -HIT 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


maittiJtjL^ai | 


1138 


6360 


65 


193 






Protein name 








LOCUS 


Name 


Acc# 


Description 














NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


£mm...ci...m 




| 5351 


154 


465 


326 


2.5e-2y 


Protein name 








Locus 


Name 


ACC# 










sp:HPPR 




083019 



Description 



ORF Name 



NTID 



7315641 ci il± 



Protein name 



ubiquinone /menaquinone cuosyntnesis 
methyltransf erase- related protein 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TTT 



Locus Name 



2.7e-07 



Acc# 



F72262 



ORF Name 



Protein name 



NTID 



|10.im!^.u..±3....fo.4 1 [ITiT 



— — Score Probability 
AAID Length Length 



I53B3" 



UT 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



1142 



AAID 



NT 



AA 



Length Length 

— 



Score Probability 



irnr 



Locus Name 



Acc# 



Description 



IN0-M1T 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



\±&Ab:X±BA..±l...L^ I 



TT3T" 



TFTT 



T5T 



FIT 



0.020"" 



Locus Name 



WW domain Binding- protein b 



gp:MMU924S4 



Acc# 



U92454 



Description 

Mus musculus WW do main binding protein b mKNA, partial cds 



ORF Name 



NTID 



AAID 



NT AA 

— ^ — Ll Score Probability 
Length Length • L - 



I20S4768 r3 57 



1144 



Protein name 

Description 
NO -HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



I23..7.0.5.0.0.2..±1...10. I tnT5 



6367 



Length Length 



Score Probability 
2T3 



1.2e-29 



Protein name 



Locus Name 



conserved hypothetical protein ylbK 



pir :H69874 



Acc# 



H69874 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



1&&Z26.16...±L.±5. I [TI¥£ 



Length Length 
1023 I \T(TT2 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



TTT7" 



Length Length 
3075 



Score Probability 
2T2I — 



i.2e-221 



Protein name 



Locus Name 



hypothetical protein mexF 



pir :T30830 



Acc# 



T30830 



Description 



ORF Name 



NTID 



\Z±6A1B.0A±1...11 1 [TTTO 



Protein name 



NT AA 

— — ^ Score Probabili ty 
AAID Length Length i - 



[£T7TT 



7TT 



2.le-5B 



Locus Name 



sp : YAW_EC0LI 



Acc# 
Q47679 



Description 

HYPOTHETICAL 23.2 KD PROTfcllAJ IK DNAQ-OMHA INTERGENIC kE^IOM 



363 



ORF Name 



lTmrr , , — , , ^ — ^, Score Probability 
NT ID AAID Length Length - L 



^5375307 tl 27 



TIT 



Protein name 

Description 
JNO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



\2££.&.0.1&D..±±..±& I [TT5TT 



AAID Length Length 
6372 



Score Probability 



T2T 



Protein name 

Description 
BTCTHTT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
— 



TUT 



Score Probability 
3TT5 



3 . 5e-46 



Protein name 



Description 



Locus Name 



sp:ACRE_ECOLI 



ACC# 



P24180 



ACRIFLAVIN RESISTANCE PROTEIN E PRECURSOR (EMVC PROTEIW) 



NT 



AA 



ORF Name 



NTID 



3.U£B.3.1f5,2....G2..,ia8. I 11152 



AAID Length Length 

stfl — 



Score Probability 
1508 



Protein name 



Locus Name 



transcription- repair coupling tactor 



gp:AF023181 



Acc# 



AF023181 



Description 



Listeria monocytogenes transcription-repair coupling tactor (mrdL) , low 
temperature requirement B protein (ltrB) , and DivIC homolog(divL) genes, 
complete cds . 



364 



• 



NT 



AA 



ORF Name 



NTID 



AAID 



31375817 i J 2 44 



□ 



637b 



Length Length 




ITT" 



Score Probability 
0.042 



69 



Protein name 



Locus Name 



Acc# 





conserved hypothetical protein AF0188 




"j pir:D69273 


D69273 




Description 














ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




3.2.147.1^1..±1».12.. 


1154 6376 


352 | 


1179 


444 


| 7.Se-42 




Protein name 






Locus Name 


Acc# 










sp:NACA_Vlii(JH 


032445 




Description 














"DEACETYLASE) | 




ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


Sif ifiSr 


122&lL±l.±l...±b 


::: 1155 6377 


196 


591 


310 


1 . 2e-27 


r< i-'j 

nj 


Protein name 






Locus Name 


Acc# 




| nypotneticai prot 


em 




~| pir:C37b^b3 


G75263 



If 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



3.41S.15.I3...±1...2.b.. 



TJWT 



14.1e-42 



Protein name 



Locus Name 



dihydroorotase Ipyrc; hax±±w 



pir ;C/bU^7 



Acc# 



C75027 



Description 



365 



ORF Name 



NTID 



34430317 ±2 38 



prrrr 



NT — . n Score Probability 

5.4e-27 



AAID Length Length 
f7TT3" 



TUT 



Protein name 



Locus Name 



protem-tyrosme pnospJiatase 



gp:AB028630 



Acc# 



AB028630 



Description 

Clostridium perr ringens hypav, JoacH, ptp, cpd genes tor. 
protein, bacterial hemoglobin, protein-tyrosinephosphatase, 2 
nucleotide 2 ' -phosphodiesterase , partial and complete cds . 



hypothetical 

3 1 -cuclic 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


45547S3_r2_45 


1158 


6350 


161 


4S6 


211 


3.8e-17 


Protein name 








Locus 


Name 


Acc# 



sp:V0^CJ>A(J^U 



P54486 



Description 

HYPOTHETICAL 17.3 KB PROTEIN IN CCCK 



-SODA IN ' rfiRtmilcJ kEUluN 



ORF Name 


NTID AAID 


NT AA 
Length Length 


Score 


Probability 


A57.Q3l41..±1...2lS 


1155 6381 


263 792 


531 


4.7e-bl 


Protein name 




Locus Name 


Acc# 


I putative glycosyi transterase. 


1 gp:SC6D7 


AL133213 


Description 




Streptomyces coelicoior cosmict 6U/. 




ORF Name 


NTID AAID 


NT AA 
Length Length 


Score 


Probability 


±110A..±X..±1 


1160 6362 


669 2070 


435 


7.4e-41 


Protein name 




Locus Name 


Acc# 






sp:NAflB_BACfcW 


035000 


Description 










PHOSPHATE DEAMINASE) 


— (GNPDA) ((jLCNbP DEAMINASE) 




i 



366 



NT 



AA 



ORF Name 



NTID 



4876090 cl HA 



AAID Length Length 
FI5 



I2W 



Score Probability 
0 . 00012 



IT77 



Protein name 



Locus Name 



sp :MFD_&ACSU 



Acc# 



P37474 



Description 

TRAHSOklJ/ L' ION-kJai^Alk COUPLING FA CTOR d'H^') 



NT 



AA 



ORF Name 



NTID 



AAID 



14676300 cl bb 



11162 



Length Length 




Score Probability 



55 



Protein name 
Description 



Locus Name 



Acc# 



MO-HIT 



n 1 

ii f j ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


r" ii£is76...±3....b.:A 


1163 


638S 


642 1525 


£86 


l.le-88 1 



Protein name 



Locus Name 



conservea nypotnetrcai protein 



1 |P ir:C7 ^ r 



Acc# 



C72391 



Description 



NT 

opt? Kfaimv NTID AAID Length 


AA 
Length 


Score 


Probability 


8.3.5.1?.B...:£1...2l8. IT5¥ 6386 226 


678 


237 


1.9e-±t* 


Protein name 


Locus 


Name 


ACC# 




sp:MfiTH 


JJUMAlSf 




Description 








(METHIONINE riEMTHASE, VITAMIN -B12 DEPENDENT) 


IMS) 




i 



367 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 



Score Probability 



TIT 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



NTID 



AAID 



\ioAA:m^±^±i 



Protein name 



transcription regulator, crp ramily 



Description 



NT 



Length Length 



AA 

— Score Probability 



Locus Name 



|pir:P722Bb 



5 . 7e-06 



Acc# 



F72285 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



i.3e-87 



Protein name 



Description 



Locus Name 



sp:PATBJbsAtJfcW 



Acc# 
Q08432 



EOTAflVE AMltloTKAMri P BftA£iiil B, 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



Protein name 



Locus Name 



outer membrane assembly protein (asmA) KP34 7 | ipir :E7l€>yi 



Acc# 



E71691 



Description 



368 



ORF Name 



NTID 



14648b7V cl yi 



TTZT 



— — Score Probability 

AAID Length Length — 

i.le-10 



JUT 



1107 



TTT 



Protein name 



transmembrane sensor 



Locus Name 
|gp:AP0bl6yl 



Acc# 



AF051691 



tactorinui) 



Description 

frseudomonas aeruginosa stress ta ctor a ipsrAj , &w srgma 
transmembrane sensor (f iuR) , and hydroxamate- typef errisiderophore receptor 
(fiuA) genes, complete cds . ^ . 



ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length — 


Probability 


I4875635_c:i_i:iy 1170 j 




313 


W2 602 


1.4e-58 


Protein name 






Locus Name 


Acc# 


["conserved hypothetical protein ytqA 




| P ir:^6yyyy 


D69999 


Description 


ORF Name NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 
9.2e-71 


aa&am:A..±i...ii 1171 




292 


375 717 




Protein name 






Locus Name 


Acc# 


1 lipoic acia syntnase 






"j |pir:A7S4^0 


A75480 


Description 


ORF Name NTID 


AAID 


NT 
Length 


— , Score 
Length 


Probability 


22&l&±±L..cl..±4.b. H72 




145 


450 




Protein name 






Locus Name 


Acc# 



Description 
NO-HIT 



369 



ORF Name 



NTID 



— — Score Probability 



TT7T 



AAID Length Length 



|2.2e-2^ 



Protein name 
-£THE 



Locus Name 



Acc# 



AF158372 



Description 



fflavobactenum johnsomae hyp othetical protein gene, partial cas ; uiab 
(gldB) , GldC (gldC) , and hypothetical protein genes, completecds; and 
hypothetical protein gene, partial cds . 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


23£2091O_c2_lli 


1174 






2S8 






Protein name 








Locus 


Name 


Acc# 


Description 














NO-HIT | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


llllTlb.b^U.^ 


1175 


«97 


304 


2712 


432 


$.6e-68 



Protein name 



Locus Name 



H5K outer membrane protein precursor : busO 
protein 



bir:J(J6027 



Acc# 



JC6027 



Description 



ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


2A±l±5XD....al..±±l H7 6 




243 


732 10S 




Protein name 






Locus Name 


Acc# 


| hypothetical protein yvq*' 






"| pir:G70045 


G70045 



Description 



370 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length J - 



l 2U9Bl^/ c'A 110 



TT7T 



6399 



VTTT 



l.be-IOO 



Protein name 



Description 



Locus Name 



sp:NAGB_BORBU 



Acc# 



030564 



M6S&HATE DEAMINASE) (StfMA) (ULOT6£> DEAMINASE) 



NT 



AA 



ORF Name 



NTID 



AAID 



6400 



Length Length 



Score Probability 
Tu4 



4.4e~27 



Protein name 



Locus Name 



enoyl-acyl carrier protein reductase 



h?ir:H75330 



Acc# 



H75330 



Description 



NT 



AA 



ORF Name 



NTID AAID Length Length 

Fim — 



T5T 



T7T" 



Score Probability 
6 .2e-10 



TO 



Protein name 



Locus Name 



Hypothetical protein APE2345 



pir:P7^462 



Acc# 



F72462 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
369 



6402 



TTTTT 



Score Probability 



2.5e-54 



Protein name 



Locus Name 



O-acetylhomoserme sulrhydrylase 



pir:D72324 



Acc# 



D72324 



Description 



ORF Name 



Protein name 

Description 
MO-HIT 



NTID 



AAID 



11181 



NT AA 

— , — , Score Probability 
Length Length : 



Locus Name 



Acc# 



371 



ORF Name 



NTID 



NT AA 

— , — Score Probability 
AAID Length Length JL 



36360812 t2 3 6' 



T7T" 



Protein name 

Description 
[NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 
\&£US 



F7X 



Score Probability 
|2.8e-40 



344 



Protein name 



Locus Name 



Acc# 



gp:£3C974S 



Description 

S.cerevisiae chromosome XIII cosmid 9745. 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length i - 



f !$A±llL±l...S.2 1 [TT34 



JUT 



4 .6e-37 



Protein name 



Locus Name 



probable translation tactor yciO 



bir:P64874 



Acc# 



F64874 



Description 



ORF Name 



NTID 



3,M6.S.a5...±3....5.3. I 



Protein name 



maturation protein pPM32 



NT 



AA 



AAID Length Length 



Score Probability 



TT3~ 



7.3e-07 



Locus Name 



gp:AFl£64&£ 



Acc# 



AF166485 



Description 

Glycine max maturation protein pPM32 ( PM3 2 ; mRNA^ complete cds 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score 



14007687 tl 10 



12238" 



11937 



Probability 
|4.8e-200 



Protein name 



Locus Name 



DPP IV 



gp:AB00Bl94 



Acc# 



AB008194 



Description 

' frorphyromonas gin givals gene ror DPP xv f complete cas. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


4ll903?_c3_138 


1187 


£409 


281 


846 


134 


9.7e-07 



Protein name 

two -component response regulator 
lytT- involved 



Locus Name 



pir:£69bbb 



Acc# 



B69655 



Description 



ORF Name 



NTID 



AAID 



NT 

Length Length 



AA 

— , Score 



T1W 



'6410 



TZUT 



WIT 



Probability 
2.6e-50 



Protein name 

hypothetical protexn jd^viu 



Locus Name 



lpxr:Bfcl>Ubl 



Acc# 



B65051 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


633/l&±L.±l..Al 


1189 


5411 


292 


879 


376 





Protein name 

conserved hypothetxcal protein yKrA 



Locus Name 



foir:C6<JtJb^ 



Acc# 



C69862 



Description 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



6412 



Protein name 



Locus Name 



&NA polymerase ECF-type sigma tactor nomoiog 
yhdM 



pir:C69B2b 



Description 



|3.7e-i2 



Acc# 



C69826 



ORF Name 



NT ID 



1191 



Protein name 

sam-dependent raetnytransterase 



Description 



NT 



AA 



AAID Length Length 
7TS 



Score Probability 
|i.0e-30 



TTT 



Locus Name 



|pir:<J72U8b 



Acc# 



C72086 



ORF Name 



NT ID 



— — Score Probability 
AAID Length Length 



16414 



[TT7T 



|1.7e-92 



Protein name 



Locus Name 



Acc# 



I sp : PftlAJaAl'tiU 



Description 

PRIMOaoMAL PROTEIN N 1 (REPLI CATION VActur yj 



NT 



ORF Name 



NT ID 



AAID Length Length 



AA 

— Score Probability 



6415 



r72" 



5.8e-ib 



Protein name 



Locus Name 



hypothetical protein MJ0V4y 



pir :E64Jyi 



Acc# 



E64393 



Description 



374 




ORF Name 



NT ID 



^ — Score Probability 

AAID Length Length 



TTTZTwrrjFryrr 



I7W 



|i.5e-24 



Protein name 



Locus Name 



two- component response regulator 
lytT- involved 



pir :B6ybbb 



Acc# 
B69655 



Description 



ORF Name 



NT ID 



— — Score Probability 

AAID Length Length 



\±20A0.b.^t±^b. - I |IT55 



mr 



3 .be-UB 



Protein name 



Description 



Locus Name 
IsptY^KJ^oLl 



Acc# 
Q46791 



HYPOTHETICAL TkAN fc> (Jk 1 kT 1 ON AL k^ULATOk IN KLHJ1 -L¥dri IN'l'Ek^Niu k^iui^ 



ORF Name 



ima^t.U4 | 



NTID AAID Length Length 
FITS 



NT 

inj 



AA 



Score Probability 
12 .4e-0B 



Protein name 
hypothetical protein 



Locus Name 
"1 •- jpirz^J^S : 



Acc# 
C72325 



Description 



ORF Name 



NTID 



II — s core Probability 

AAID Length Length - 



TTTT 



16419 



[2T 



Protein name 



Locus Name 



Acc# 



Description 
NO -HIT 



375 



• 



ORF Name 



NTID 



NT AA 
T — T — Score Probability 
AAID Length Length JL 



14252182 11 45 



Protein name 



1642 0" 



|2.6e-18 



Locus Name 



Acc# 



resolvase 



Description 



pir :S38652 



S38652 



ORF Name 



NT AA 

— , — , Score Probability 
NTID AAID Length Length *- 



:UU.5AU2.5....r.l...l 



Protein name 



FPT7 1 fTTPT 



Locus Name 



Acc# 



Description 
MO-HIT 



GRF Name 



NT AA 

— , — , Score Probability 
NTID AAID Length Length — ; z ~ 



14.^..7.U3.m...r,2....6.6. 



Protein name 



T2W 



^ 1 [TTS 



Locus Name 



Acc# 



Description 
INC -HIT 



ORF Name 



NT AA 

— , — ' Score Probability 
NTID AAID Length Length — - — ~ x 



Protein name 



T7UT" 



63 



Locus Name 



Acc# 



Description 
MO -HIT 



ORF Name 



NT AA 

^^-rr, x — ^ w — , Score Probability 
NTID AAID Length Length JL 



Protein name 



TZUT 



Locus Name 



Acc# 



Description 



NO -HIT 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



15852585 ci 170 



TZUT 



[6425 



3TT" 



1296 



T72T 



2.3e-I77 



Protein name 



Locus Name 



hypothetical protexn 



Description 



bir:JQ1020 



Acc# 



JQ1020 



ORF Name 



Protein name 



Description 



NT 



AA 



NT ID 



AAID Length Length 

m$ — 



Score Probability 



Locus Name 



Acc# 



INO-HI'i 1 



ORF Name 



Protein name 



Description 



NT 



AA 



NT ID 



AAID Length Length 



Score 



S3" 



ST 



Probability 
0.045 



Locus Name 



sp:^kb2_cJAk!li!L 



Acc# 
Q21767 



SRD-2 PROThllN 



ORF Name 



Protein name 



Description 



NT 



— Score Probability 



NTID 



AAID 



Length Length 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



20328257 cl 164 



Protein name 

Description 
INo-HIT 



NT 



AA 



NTID 



AAID 



TZUT 



— , „ — ^. Score Probability 

Length Length ~ - x - 

1 pm 

Locus Name Acc# 



ORF Name 



NTID 



AAID 



zu3.2a&;/3....rju..x5j- ,.„i 11208 



Protein name 



conserved hypothetical protein 



Description 



NT 



AA 



Length Length 



Score Probability 
T53 



Locus Name 



pir :E72312 



3.1e-i5 



Acc# 
E72312 



ORF Name 



NTID 



AAID 



|2.Mm.7.5....c.2....2.ii I (TTu^ 



Protein name 

Description 
MO-HIT 



NT AA 
Length Length 

7W 



— , Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 

Description 
IMO-HIT 



NT 



AA 



NTID 



2&101A2&...t2..M I fl^Tu - 



AAID Length Length 
— 



Score Probability 



Locus Name 



Acc# 



ORF Name 



NTID 



NT AA 

, , ^ — _ x — ^ Score Probability 
AAID Length Length JL 



|2.0.7.2.315.fi...c2...2.ia I IT2TT 



72T 



Protein name 



Locus Name 



conserved hypothetical protein HP0713 



pir :A64609 



Acc# 



A64609 



Description 



378 



ORF Name 



NT ID 



20976426 ±3 114 



TZTT 



Protein name 



NT AA 

_ — Score Probability 
AAID Length Length L - 



asparaginase homolog yccC 



Description ; 



3.9e-07 



Locus Name 



bir:P6^7b4 



Acc# 



F69754 



ORF Name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length ,L ~ 



2±£A1$.25..±±..:.L 



T7TT 



11380 



Protein name 
Description 

ANAEROBIC C4-DICMB0XYLATE TRANSPORTER DCUB" 



Locus Name 



|sp:DCUBJ!AElN 



Acc# 



P44855 



NT 



AA 



ORF Name 



NTID 



1214 



AAID Length Length 




Score Probability 
0.0070 " 



Protein name 



Locus Name 



putative transmembrane ettlux protein. 



bp:SCF<>i 



Acc# 



AL132973 



Description 



Streptomyces coelicolor cosmid F91. 



ORF Name 



NTID 



NT AA n _ . , . . . . 

— , — , Score Probability 
AAID Length Length 



TTT5~ 



0.031 



Protein name 



Locus Name 



sp:SPRO_XENLA 



Acc# 



P36378 



Description 

(OS T EONECTIN) (UN) (BASEMENT MEMBRAN E PROTEIN BM-4u) 



379 



NT 



AA 



ORF Name 



NT ID 



23617137 c3 258 



AAID Length Length 
"GTS 



TIT 



Score Probability 
2^7 



4.5e-23 



Protein name 



Locus Name 



sp:YJV7_YEAST 



Acc# 
P40893 



Description 

ffif&OTftHtfieAL 22.0 Kt) frkOTEIN Hfl HXTll-HXTS IMEftflffltlC! REGION 



NT 



AA 



ORF Name 



NT ID 



AAID 



23631252 c2 23£ 



Length Length 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 
^ — , — , Score Probability 
Length Length ^ 



Z3.6.3.15.5lQ...c1...17A 



TTT 



TTT 



TTT 



S.ue-OS 



Protein name 



Locus Name 



hypothetical protein 



pir :A64502 



Acc# 



A64502 



Description 



ORF Name 



NTID 



216.111B.l...a±..±9.1 I [T2T^ 



Protein name 



NT AA 

— , — , Score Probability 
AAID Length Length *- 



6441 



probable mtegrase/recombmase 



Description 



TUT 



TTTT 



Locus Name 



pir :B71194 



3.6e-0b 



Acc# 



B71194 



380 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



Protein name 
Description 



TTZW 



15442" 



[7S~ 



0.025 



Locus Name 



Acc# 
Q80910 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|240S60b6 tj Ibb 



1221 | 



5.5e-04 



Protein name 



Description 



Locus Name 



sp : CfiBAjiAUAM 



Acc# 



P23939 



BAMH1 CoMTkoL kUMWT 



ORF Name 



Protein name 



NTID 



rnrr 



AAID 



6444 



NT AA 
* — Score 

Length Length 



TST" 



Locus Name 



Probability 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



TFZT 



NT 



AAID Length Length 



— Score Probability 



TBT" 



Locus Name 



Acc# 



Description 



(NO -HIT 



381 



ORF Name 



24394017 t3 153 



Protein name 



NTID 



AAID 



TTTT 



NT AA 

— , — , Score Probability 
Length Length 



TWT 



Locus Name 



Acc# 



Description 
IMO-HIT 



ORF Name 



NTID 



NT AA 

— , ■ — , Score Probability 
AAID Length Length • L - 



M4S.5.ai2...al...27.a I 



T&TT 



0.0070 



Protein name 



Locus Name 



arylesterase 



gp:AF0446S3 



Acc# 



AF044683 



Description 



Agrobacterium radxobacter putative dihydrolipoamicLeS- acetyl transferase 
(dla) gene, partial cds; arylesterase (ada)gene, complete cds; and putative 
dihydrolipoamide dehydrogenase (dlh) gene, partial cds. 



ORF Name 



NTID 



i4£i££3....ci...a&i I imf 



Protein name 



AAID 



NT AA 

— , — , Score Probability 
Length Length *- 



IT 



TTT 



Locus Name 



Acc# 



Description 
TO-HIT — 



ORF Name 



NTID 



1&6.±C)M1..±2..£.1 1 \TTT7 



Protein name 



AAID 



6449 



NT 



AA 



Length Length 
TIM 



Score Probability 



TTT 



Locus Name 



Acc# 



Description 
(MO-HIT 



382 



ORF Name 



24832203 ti 29 



Protein name 



NTID 



AAID 



1228 



16450 



NT AA 

— , — „ Score Probability 
Length Length • L - 



52 



Locus Name 



Acc# 



Description 
(NO-HIT 



ORF Name 



NTID 



NT AA 

— , — ^ Score Probabi lity 
AAID Length Length 



24ftft2a3L2...al„.I7.5 1 11327 



T7F" 



FT7~ 



i.2e-31 



Protein name 



Locus Name 



adaptive response regulatory protein 



gp:AF047839 



Acc# 



AF047839 



Description 



Pseudoalteromonas sp. S9 putative glucosyl hydrolase precursor andadaptive 
response regulatory protein (ada) genes, complete cds . 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length J - 



6452 



TOT 



4 .4e-32 



Protein name 



Locus Name 



unknown 



|gp:AF0u£034 



Acc# 



AF006034 



Description 



Clostridium pasteurianum l , 3 -propanediol dehydrogenase (dhaTj gene , complete 



cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



1£53&11&..±1..M. I [T2TT 



Length Length 
7B - 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 
INO-HIT 



383 



NT 



ORF Name 



NTID 



25514666 ti 14V 



AAID Length Length 

cm 



AA 

— , Score 



Probability 
10.010 



Protein name 



Locus Name 



probable serine » tnreonine - protein Jcinase 



pir :T41341 



Acc# 



T41341 



Description 



ORF Name 



NTID 



AAID 



NT AA 
, — Score 
Length Length 



TZTT 



T7T 



Probability 
|3.9e-08 



Protein name 

hypothetical protein MTH84 7 



Locus Name 



bir:A692li 



Acc# 



A69213 



Description 



ORF Name 



NTID 



AAID 



NT AA 
* — Score 

Length Length 



Probability 
THJe^l 



Protein name 



Description 



Locus Name 



Acc# 



l sp:£>felA„BAc!^U 



PRIMO^OMAL PROTLUN W (REPLICATION Factor xj 



— — Score Probability 
Length Length 





AA 



ORF Name 



NTID 



AAID 



TZTT 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



384 



ORF Name 



NT ID 



NT AA 

— , , — , Score Probability 
AAID Length Length £ - 



T23T 



[2 . Ue-91 



Protein name 



Description 



Locus Name 



sp:ASG2JKJ0LI 



Acc# 
P00805 



AMlbOHVDROLAgE tl) (L-A5NA5B II) (COLA&frASE) 



NT 



AA 



ORF Name 



NT ID 



AAID 



TZTT 



Length Length 
TT73 — 



J5T 



Score Probability 




Protein name 



Locus Name 



mannose - 1 -phosphate guanyiyl transferase 



Description 



lpir:H72303 



2.6e-57 



Acc# 



H72303 



ORF Name 



NT ID 



AAID 



NT AA 
t t^Z^ Score Probability 
Length Length — : 



Protein name 

Description 
(EC 2.3.1.-) 



3.0e-56 



Locus Name 



sp: YJV8_YEAST 



Acc# 



P40892 



ORF Name 



Protein name 



NT 



AA 



NT ID 



TUT 



AAID Length Length 
FTTI — 



Score Probability 
531 



2 . 3e-51 



Locus Name 



oxidoreciuctase , alcto/Xeto reductase family 



pir :E72284 



Acc# 



E72284 



Description 



3 85 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 




Score Probability 



ST" 



Locus Name 



Acc# 



Description 



MO -HIT 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 

PT5T 



AA 



ire 



Locus Name 



Acc# 



Description 
NO-HIT 



h\ t;s! ! 
f \ if 



.:(; t!H, 



ORF Name 



Protein name 



Description 



NO -HIT 



NTID 



AAID 



NT AA 
Length Length 



Score Probability 



TZZT 



TT5T 



Locus Name 



Acc# 



ORF Name 



Protein name 



Description 



1N0-H1T 



NT 



AA 



NTID 



AAID Length Length 



— Score Probability 



TZZT 



I7T 



[OT 



Locus Name 



Acc# 



ORF Name 



Protein name 



NT 



NTID 



AAID Length Length 



AA . i . 

— Score Probability 



TUT 



0.014 



Locus Name 



hypothetical protein a 



pir :S4^1l5 



Acc# 



S49113 



Description 



386 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
53 



Score Probability 
6 ,3e-12 



Protein name 



Locus Name 



integrase 



| gp:BFU7bi-7T" 



Acc# 



U75371 



Description 



Bacberoides rragilxs bransposon T n4bbb TnpA (tnpA) , integrase uncj y xnpu 
(tnpC), excisionase (xis) , mobilization protein (mobA) , and beta- lactamase 
(cfxA) genes, complete cds; and unknown genes. 



ORF Name 


NTID 


AAID 


NT AA 
— — , Score 
Length Length 


Probability 


34l04l27_t3_±4l 


1246 


646$ 


OTB "2'427 273 


3.0e-^0 


Protein name 






Locus Name 


Acc# 








sp : IRGA_VlbCJk 


P27772 


Description 










IRON-REGULATED 


OUTER MWMLskANk 


" VIRULENCE PRO 1 KIN PRejUUKqUK 




ORF Name 


NTID 


AAID 


NT AA 
— — , Score 
Length Length 


Probability 


3.426.0.y.Il..±i....ib.u. 


1247 


6469 


166 501 




Protein name 






Locus Name 


Acc# 


Description 










MO-HIT | 


ORF Name 


NTID 


AAID 


NT AA 
— — - — Score 
Length Length 


Probability 




1248 


6470 


716 2151 




Protein name 






Locus Name 


Acc# 


Description 











387 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



3S7047&6 ±2 92 



6471 



52" 



T7T 



2.be~29 



Protein name 



Locus Name 



integrase intNl 



gp:BUU51917 



Acc# 



U51917 



Description 



Bacteroid.es unitormis insertion element NBU1 tragment, integraselntNl gene, 
complete cds . 



NT 



AA 



ORF Name 



NT ID 



3S.38817 c2 247 



T2W 



AAID Length Length 



Score Probability 
TJT§ — 



l.Se-134 



Protein name 



Locus Name 



aspartate ammonia- lyase 



gp:WSAJi533 



Acc# 



AJ002933 



Description 



Wolinella succmogenes aspA, clcuA genes and partial ansA gene . 



ORF Name 



NTID 



AAID 



NT AA 
, — , — , Score Probability 
Length Length ^ 



19An.$Al.±2..3±... I 



ST7T 



'2 . Be-11 



Protein name 



Locus Name 



AigZ 



|gp:!PAUM431 



Acc# 



U52431 



Description 



Pseudomonas aeruginosa AigR- cognate sensor AlgZ (algZj gene, complete cds . 



NT 



AA 



ORF Name 



NTID 



Aaftft&S3L±a..AX i 



AAID Length Length 
6474 



TOT" 



Score Probability 
TT7 



3 . le-08 



Protein name 



Locus Name 



transcription regulator 



gp:AP008220 



Acc# 



AF008220 



Description 
Bacillus subtilis rrnB-dnaB genomic region. 



388 



NT 



ORF Name 



NT ID 



AAID Length Length 



AA 

— Score Probability 



14072187 cl 172 



1764 



9.3e-95 



Protein name 



Locus Name 
|sp:DXy_bA<JiW 



Acc# 



P54523 



Description 



ORF Name 



\42$±&'A ci 207 



Protein name 



NTID 



NT 



AAID Length Length 



AA 

— Score Probability 



Locus Name 



Acc# 



Description 



bSfO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 

— 



401 



143 



3.§e-05 



Locus Name 



gp: 



ECASPA 



Acc# 



X02307 



Description 

E . coli aspA gene tor aspa rtase (L-aspartate ammonia -lyase) (EC4.3.I.I; . 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


A±5A0M....z2...21& 


±256 5478 


806 


2421 


171 





Protein name 



Locus Name 



R27-2 protein 



pir iTJO^yb 



Acc# 



T30296 



Description 



389 



ORF Name 



NTID 



AAID 



4960312 tl 4b 



Protein name 



putative mtegrase 



Description 



— — S core Probability 

Length Length - 



IT2T" 



1572 



7 . 2e-l6 



Locus Name 



lgp:BA1242byi 



Acc# 



AJ242593 



Bacteriophage Ail8 complete genome. 



ORF Name 



5?$7i2 ri 42 



Protein name 



NTID 



AAID 



£3W 



— — Score Probability 
Length Length — 

— 



Locus Name 



Acc# 



Description 



ORF Name 



Protein name 



NTID 



6481 



NT 



AA 



AAID Length Length 



Score Probability 



FT 



Locus Name 



Acc# 



Description 



IMC-M1T 



ORF Name 



Protein name 



NTID 



1260 



hypothetical protein slr2078 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



1.6e-20 



Acc# 



S77566 



390 



ORF Name 



9&00466 cl 152 



Protein name 



NT ID 



TZZT 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



Locus Name 



Acc# 



Description 
INO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



probable prerolciin subunit APE144 0 



Description 



NT 



AA 



Length Length 



Score Probability 
0.00034 



Locus Name 



Acc# 



G72622 



ORF Name 



NTID 



NT AA 
T — ^. — _ Score Probability 
AAID Length Length JL 



2439 



2.7e-45 



Protein name 



Locus Name 



Acc# 



putative transmembrane protein Wzc 



gp:AP104512 



AF104912 



Description 



Escherichia coli K3Q capsule biosynthesis cluster, partialsequence . 



ORF Name 



NTID 



NT AA 

_ ^ T — Ll „. — ^ Score Pro babi lity 
AAID Length Length JL 



TuT" 



KIT 



73" 



0.034 



Protein name 



Locus Name 



Acc# 



nuclear tactor Kappa-B2 



gp:HStt208l6 



U20816 



Description 

Human nuclear tactor Jcappa-B2 (NF-KB2) gene, partial cds. 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , , Score 
Length 


Pr 


obability 

75""" A o "~~ T A A 






"5487 


588 


1767 1406 




y . ue- 144 


Protein name 








Locus Name 




Acc# 










sp:SYQ_ECOLI 






Description 


























i 


UKr INdilie 


NTID 


AAID 


NT 
Length 


AA 

- — , Score 
Length 


Probability 


i350033_i:l_ii 


1266 


648S 


224 


675 






Protein name 












Acc# 


Description 














ttO-Hll 1 












i 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability- 


lS.aii&.La^y.i 


X261 


6489 


172 


519 470 




X . 4e-44 


Protein name 








Locus Name 




ACC# 










sp:TE>X_MVcTU 




P95282 


Description 
















PROBABLE THIOL 


PEROXIDASE, 














ORF Name 


NTID 


AAID 


NT 
Liengun 


AA 

— , Score 
j_ieritjL.ii 


Probability 




1268 


6430 


273 


1140 181 






Protein name 








Locus Name 




Acc# 
AF038866 




transposase 








~j gp:AF03886b 






Description 








. , - 1 L J _V¥t 1 




-i 1 -t fr +~ -i i— \n 





feacteroides iragilis tran sposon Tn5520 transposase (JoipH) andmoDinzab 
protein BmpH (bmpH) genes, complete cds . 



392 



ORF Name 



NT ID 



13545437 c3 9^ 



Protein name 



Description 



AAID 



15491 



— — Score Probability 
Length Length — 



FT 



0.00077 



Locus Name 



sp:Dl3H_THJb!MA" 



Acc# 



P36206 



MA-fiKtolNfl MO'l'i i llJSi HIT 



ORF Name 



1573S537 tl 24 



Protein name 



NTID 



TT7TT 



AAID 



NT 



AA 



Length Length 
BT5 



Score Probability 



TOT- 



Locus Name 



Acc# 



Description 



ORF Name 



Protein name 



NTID 



TTTT 



hypothetical protein kv!624c 



Description 



NT 



AA 



AAID Length Length 



— Score Probability 



TTT 



Locus Name 
bir:P70bb8 



1. de-OS 



Acc# 



F70558 



ORF Name 



Protein name 



NTID 



TZTF 



conserved hypothetical protein . mtHV^ 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TTT 



Locus Name 



bir:B^ylyb 



l.le-10 



Acc# 



B69196 



393 



ORF Name 



NTID 



AAID 



NT AA 

— — • , Score Probability 
Length Length 



24070736 c2 97 



Protein name 



1273 



649S 



Locus Name 



1 . 7e-32 



Acc# 



Description 



sp:YQGH_BAUUU 



P46339 



REGION (0kPV2) 



ORF Name 



126438887 c3 ioo 



Protein name 



NTID 



TUT 



AAID 



NT 



AA 



Length Length 
TT75 — 



Score Probability 



Locus Name 



Acc# 



Description 



UsfO-MiT 



ORF Name 



NTID 



AAID 



NT AA 

— — , Score Prob ability 
Length Length 



Protein name 



IB" 



Locus Name 



Acc# 



Description 



WO -HIT 



ORF Name 



Protein name 



NTID 



TTTT 



16456 



NT 



AA 



AAID Length Length 



Score Probability 



T5T 



TZT 



Locus Name 



1.3e-07 



ACC# 



hypothetical protein 



Description 



pir :T28682 



T28682 



394 



ORF Name 



NTID 



— — s core Probability 

AAID Length Length 



34179077 tl lb 



TUT 



6499 



T7IT 



i.2e-10 



Protein name 



Description 



Locus Name 



sp:Ei>aA_BUkfcJo 



Acc# 



Q45407 



EPS I EoL^AOOHAklbfl EXkuk'l uUTttk MfelMBKAttB ERO'l'^iM epsa PK^uuK^uk | 


ORF Name 


"' NT AA 

— — , Score 
NTID AAID Length Length 


Probability 


36i3i937_cl__V8 


1276 6566 14$ 450 123 


2.5e-67 


Protein name 


Locus Name 


Acc# 


' phosphate-binding. protein psts 1 pir:He>yuy/ 


H69097 


Description 


ORF Name 


NT AA 
* — Score 
NTID AAID Length Length 


Probability 




| TTT9 — 6501 VF5~~ 1413 656 


" 2.7e-64 


Protein name 


Locus Name 


Acc# 


t Gumu protein 


| pir:S67b^u 


S67820 


Description 


ORF Name 


M Score 
NTID AAID Length Length 


Probability 




TZWO " " 158 **TT 243 


3.6e-ai 


Protein name 


Locus Name 


Acc# 


| hypothetical 


protein (repA b' region) 1 pir:S30i^u 


S30120 


Description 


ORF Name 


NT AA 

tz — — Score 

NTID AAID Length Length 


Probability 




TZ3T 6563 216 651 308 


2.0e-27 



Protein name 

DedA tamily protein 



Locus Name 



pir:BVb^bJ 



Acc# 



B75253 



Description 



395 



ORF Name 



NTID 



NT AA 
v — ^ — _ S core Probabi lity 
AAID Length Length JL 



6721850 c3 99 



6504 



TFT 



2 . le-23 



Protein name 



Locus Name 



Acc# 



N-acetylmuramoyi-L- alanine amidase nomolog 



lpir:GG4126 



G64126 



Description 



ORF Name 



NTID 



NT AA 
T — T — Score Probability 
AAID Length Length JL 



Protein name 



pnosphate -binding protein PstS 



Description 



Locus Name 



pir:H65097 



3 . le-38 



Acc# 



H6 9097 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 



Score Probability 



Locus Name 



Acc# 



Description 
WO-Hltf 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — 



H^lOAllg^c^lU | fTWT 

Protein name 



681 



12 . 5e-34 



Locus Name 



gp:PGU6 02 08 



ACC# 



U60208 



Description 

PorpJayromonas gingival is orri, orr2 and. ort3 genes, complete cds . 



NT 



AA 



ORF Name 



NTID 



1737S2 c2 S4 



AAID Length Length 



1200 



Score Probability 
^5 



|9.2e-62 



Protein name 



Locus Name 



Acc# 



sp:YBDG_EGOLI 



Description 

HYPOTHETICAL 46.6 KD PROTEIN IN PHEP-NENEs INTERGENXC REGION 



NT 



AA 



ORF Name 



NTID 



AAID 



22063387 c2 92 



Length Length 



Score Probability 
.B.5e-67 



Protein name 



Locus Name 



alpha- l, 3/4- tucosidase precursor 



bp:SSU333d4 



Acc# 



U39394 



Description 



Streptomyces sp. alpha- 1 , 3/4- tucosictase precursor gene, completecds . 



NT 



AA 



ORF Name 



NTID 



&ms5...aa...iia i ix^f 



AAID Length Length 



Score Probability 
0.0027 



T06 



Protein name 



Locus Name 



Acc# 



sp:YEHT_ECOLI 



Description 

HYPOTHETICAL 27.9 KD PROTEIN IN M0LR-6GLX INTERGENIC REGION 



ORF Name 



NTID 



AAID 



NT AA 
Length Length 



Score Probability 



12'89 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



397 



ORF Name 



24406550 c2 99 



Protein name 



N'T ID 



NT AA 

_ ^ — _ _ — _ Score Probabi lity 
AAID Length Length JL 



Tuir 



Locus Name 



I . ue-05 



Acc# 



Description 



|gp:GGU25741 



Group G streptococcus strain g6 emmL gene, partial eels . 



U25741 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 
MO-HIT 



ORF Name 



NTID 



NT AA 
_ T — — _ Score Probability 
AAID Length Length -L 



2S.g.3..«a.7...±l...iS. I fTT^ 



Protein name 



probable extracellular nuclease 



Description 



1071 



TUT 



Locus Name 



|pir:D7562£ 



Acc# 



D75625 



NT 



AA 



ORF Name 



NTID 



2&2.iaai2...ca„.ii7. I iijsj 



AAID Length Length 



^T5" 



Score Probability 
o . 00S8 



Protein name 



Locus Name 



silent surtace layer protein 



|gp:AF07«6S 



Description 



Acc# 



AF079365 



Lactobacillus crispatus silent surtace layer protein (cbsB) gene, partial 
cds . 



398 



ORF Name 



Protein name 



NTID 



— — Score Probability 
AAID Length Length 



12 94 



Locus Name 



MAR binding r i lament -xiJce protein i:MFPi 
protein 



Description 



pir:T07111 



0.043 



Acc# 



T07111 



NT 



AA 



ORF Name 



NTID 



\lBABAi:L.a±...M.. 



AAID Length Length 
— 



Score Probability 

0.00012 — 



Protein name 



Description 



Locus Name 



IspiPFEAJ^Al! 



Acc# 



Q05098 



FERklti ENTKkO^AC'l ' lN RKCUk ' l ' Ok P&ECUk^Ok 



NT 



ORF Name 



NTID 



AAID 



3.0.19.20.B.6....cl...:/.b.., 



Length Length 
T7TJT 



AA 

— Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



sp:BGAL_THJi!MA 



BETA- (j Ah AO T 0 ^ 1 b Afci LI , ( LACTASE ) 



ORF Name 



NTID 



AAID 



NT AA 
— — Score 
Length Length 



lllS£2&L.al..±1.6...... I 



73u~ 



Probability 
% .$e-72 



Protein name 



Locus Name 



Acc# 



£>tiA-directed £>NA polymerase, 111 chain 
dnaX:DNA polymerase III (gamma and tau 
cm bun its) dnaX _ _ _ 



bir:aiJ7y6 



Description 



ORF Name 



34570437 ±2 47 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



1298 



W5T 



TT7T" 



8.5e-H6 



Locus Name 



spiPKPDJicJoLl 



Acc# 



P15288 



(ffiMlDASE D) 



ORF Name 



3S$>£0§07 cl 74 



Protein name 



NTID 



AAID 



T2W 



transaictoiase- related protexn 



Description 



NT 



AA 



Length Length 
TIT 



Score Probability 



FIT 



Locus Name 



pir:<372394 



Acc# 



G72394 



ORF Name 



NTID 



NT AA 

— ■ — , Score Probability 
AAID Length Length 



Iff 



2.0e-0£> 



Protein name 



Description 



Locus Name 



IgprAPU'm^ 



Acc# 



U72238 



Anabaena PCC 7 120 OR F Ri, 0R FR2, 0RFR3 , 0RFR4, and URKR5 genes , complete 
sequences. / ' 



ORF Name 



NTID 



N'T AA 

— — Score Probab ility 
AAID Length Length 



|447.a2L5....G3....:LUb... 



1301 



2 .6e-£$ 



Protein name 



Description 



Locus Name 



sprBGALJ^ACME 



Acc# 



052 84 7 



BETA- SAIACTOS IDA« K , ( LACTASE ) 



400 



NT 



AA 



ORF Name 



NTID 



AAID 



TTUF 



Length Length 
353 



Score Probability 



TIT 



Protein name 



Description 



Locus Name 



Acc# 



1N0-H1T 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



l£mbJ.3...±3....1i=L 



TJUT 



T5!T 



5.1e-15. 



Protein name 



Locus Name 



pro£>a£>le proteinase PAB196 o 



foirtAVbl V i* 



Acc# 



A75179 



Description 











NT 


AA 


l;3 


ORF Name 


NTID 


AAID 


Length 


Length 


}■' « 


lS.B.ZUiAl^I.Z^L^ 


1304 


6526 


157 


474 



jjii {!|!|. 



Protein name 

Description 
[NO -HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



F7T 



Score Probability 
0.012 



TUHT 



Protein name 



Description 



Locus Name 



l gp:A T AtJ0l2bbi 



Acc# 



AC012563 



Arabidopsis thal iana chromosome I BAC T23K23 genomic sequence, complete 
sequence. 



401 



ORF Name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length 



Protein name 



TJU£~ 



TEW 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



NTID 



AAID 



— , — , Score Probability 
Length Length 



Protein name 



TJUT 



?rnr 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



\11LS.L&L'1.±'L.;<L 



Protein name 



NTID 



1308 



AAID 



NT AA 

— — Score P robability 
Length Length 

mi — 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



NTID 



AAID 



NT AA 

— — Score Prob ability 
Length Length — ; 



\A0MAi:L±±..:L. 



Protein name 



TJUT 



268 I [3TT7 



Locus Name 



0.00062 



Acc# 



Description 



Sp: Y06 6__METJA 



Q60377 



HYPOTHET I CAL PRO TEIN MJOObb 



402 



ORF Name 



6837782 £1 4 



Protein name 



NTID 



TJTU~ 



AAID 



NT 



AA 



Length Length 



— Score Probability 



Locus Name 



Acc# 



Description 



ORF Name 



Protexn name 



NTID 



TTTT 



NT AA 

— — , Score Probability 
AAID Length Length 



TTB" 



Locus Name 



Acc# 



Description 



Z3 



MO-HIT 



ORF Name 



Protein name 



NTID 



TTTT" 



AAID 



NT 



AA 



Length Length 

im — 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



10.mb.b.^...al...lB.L. 



Protein name 



NTID 



TJTT 



6b35 



NT 



AA 



AAID Length Length 
231 



Score Probability 



SIT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



TJTT" 



AAID 



NT AA _ , , . _ . . 
— • — Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



WfO-HIT 



403 




ORF Name NTID AAID 


NT 
Length 




AA 

— , Score 
Length 


Probability 


n8B0S53_ci_ly4 1215 6537 


485 


1467 1U4U 


5.iSe-105 


Protein name 


Locus Name 


Acc# 


cell division protein 


gp:PAL2492Ul 


AJ249201 


Description 




trpvnfplla albensis ttsO (partial) , 


ttsA ana 


ttsz genes ana 






ORF-f ts (partial) . 












ORF Name NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


I339S3i2_cl_lil 1316 6538 


489 


1470 12 /b 


6.$e-130 


Protein name 






Locus Name 


Acc# 








sp:MURC_P0kcjl 


ATI O *3 1 

yblo 3 1 


Description 










ACET V LMUklANOYL - L - ALAN INE iJYNTMETA^) 








1 


ORF Name NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


JSEKAl^alJLkA Ur7 


254 


76b 341 


" 6.4e-3i 


Protein name 


Locus Name 


Acc# 


FtsQ 


gp:AB00455B 


AB004555 


Description 










Porphyromonas gmgivalis genes tor Ftsy, ftsA, 


FtsZ, complete 


cds . | 


ORF Name NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


l£Afi£l.,.c2...l£SL 1J1B 6540 


669 


2010 3334 


0.0 


Protein name 


Locus Name 


Acc# 


DNA gyrase B summit 


gp:M0177i3 


AB017713 



Description 



Bacteroides tragi iis gyrB gene tor djna gyrase B suJounit, compietecas. 



404 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



16593937 cl 127 



TTTT 



TTUT 



|6.7e-36 



Protein name 



Locus Name 



sp:YLAu_tsA(JblU 



Acc# 



007639 



Description 

HYPOTHETICAL 43,7 KT? PftOfBlN IN Kf^RJj-^VC A INTllSilkCjStllO kU<iloN 



ORF Name 



NT ID 



— — Score Probability 
AAID Length Length 



1S80S437 c3 192 



Protein name 



Locus Name 



UDP-N-acetylmuramoylalanme-D-glutamate 
ligase 



bir:H70477 



Description 



K.9e-l8 



Acc# 



H70477 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



im£3....cl...i3.:L 



1321 



3T5T" 



52" 



0.047 



Protein name 



Locus Name 



OrlslOc 



| gp:^CU4222 7 



Acc# 



U42227 



Description 



Saccharomyces cerevisiae repiicative mitocnonanai una poiymerasecataiytic 
subunit (MIP1) gene, nuclear gene encoding mitochondrialprotein, partial 
cds, and putative 10-f ormyl-tetrahydrof oiatebinding protein (FTB1) ,gene, 
complete cds . 



ORF Name 



NT ID 



NT AA „ n , , . n ■ . . 
— — , Score Proba bility 
AAID Length Length — — — 



TTTT 



TST" 



2.1e-2b 



Protein name 



Locus Name 



hypothetical protein 1 



pir :S70830 



Acc# 



S70830 



Description 



405 



NT 



AA 



ORF Name 



NTID 



AAID 



20010316 t2 72 



TTZT 



6545 



Length Length 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



2LiAftflia...c3L„.ia7. I irra* 



AAID Length Length 

— 



TUJT 



Score Probability 
7.6e-60 



Protein name 



Locus Name 



unknown 



gp:EFUy47 07 



Acc# 



U94707 



Description 



Enterococcus taecalis strain 
yllB, yllC, yllD, pbpC, mraY, 
complete cds . 



A24836 cell wail/cell divxsion genecluster, 
murD, murG, divlB, ftsA andftsZ genes, 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length JL 



Iii5l4i.7.-.±a...7.a I [ITSF 



6547 



lW 



2.5e-l3. 



Protein name 

Description 
HYPOTHETICAL 80.2 KE> PROTEIN 



Locus Name 



'sp : YGY4 HALSQ 



Acc# 



P21562 



IN THE 5 ' REGION OF GYRA AND GYRB (ORE 4) 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length JL 



i.4e-i^l 



Protein name 



Locus Name 



cell division protein 



gp:PAL249201 



Acc# 



AJ249201 



Description 



Prevotella albensis ttsQ (partial) , ftsA and ttsZ genes and 
ORF-f ts (partial) . 



NT 



AA 



ORF Name 



NTID 



AAID 



r'3 100 



rrrrr 



Length Length 
FTS 



Score Probability 



T7T 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 




TIT 



Score Probability 
|B..9e-34 



Protein name 
Description 

a»mns enzyme) 



Locus Name 



Acc# 



□ 



NT 



AA 



ORF Name 



NTID 



AAID 



i4£)..7.^1.7.7....a3....i7.S I 



Length Length 
— 



Score Probability 
|3.9e-72 



TUT 



Protein name 



Locus Name 



hypothetical protein 



pir :S76527 



Acc# 



S76527 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



1330 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



407 



NT 



AA 



ORF Name 



NT ID 



AAID 



124414077 tl ±1 



TJJT 



Length Length 



Score Probability 



WIT 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



\2&5.6A1.±1„35. I |ITJ2 



Length Length 



TIT 



Score Probability 
T5E 



4.ae-09 



Protein name 



Locus Name 



conserved hypotnetical protein 



pir :H75460 



Acc# 



H75460 



Description 



ORF Name 



NT AA 

— — , Score Probability 
NTID AAID Length Length JL 



TUT 



1 [m — "I 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



\25&1116:L..tl..Al I [TTR 



NTID AAID Length Length 





Score Probability 



— I mu 



Protein name 



Description 



Locus Name 



Acc# 



[NO-HIT 



NT 



AA 



ORF Name 



NTID AAID Length Length 

553 



Score Probability 



TT33" 



555T 



Protein name 



Description 



Locus Name 



Acc# 



INO-HTT 



408 



NT 



AA 



ORF Name 



NTID 

rmz — 



AAID 



Length Length 



Score Probability 
■7.5e-i24 



Protein name 



Locus Name 



hemolysin a 



bp:PM[ES7587 



Acc# 
U27587 



Description 



Prevotelia melaninogenica hemolysin A (pnyA) gene, complete eels . 



ORF Name 



NTID 



NT AA 

— J , — i _ 1 Score Probability 
AAID Length Length " L - 



2931518 C3 185? 



TUT 



l2.6e-8-5 



Protein name 



Locus Name 



UDP-MurNac-tripepticLe synthetase 



pir :E70450 



Acc# 



E70450 



Description 



ORF Name 



NTID 



NT AA 

— A , — L1 Score Probability 
AAID Length Length — i - 



6560 



3T" 



7F" 



l. 5e-06 



Protein name 



Locus Name 



phospno-n-acetylmuramoyl-pentapeptide- 
transferase (mraYl) RP595 



pir :E/1664 



Acc# 



E71664 



Description 



ORF Name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length — 



1152A±5.2.±±..A0. I [OH? 



T71T8~ 



Protein name 



Locus Name 



conserved hypothetical protein aq__8 54 



pxr :B70374 



Acc# 



B70374 



Description 



409 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



1340 



TT7TT 



1 . Oe-62 



Locus Name 
|sp:MUk(J_l*A(JiiU 



Acc# 



(EC 2.4.1. -J 



ORF Name 



I333§8bb7 ±3 iu>i 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



Length Length 



Score Probability 



Locus Name 



Acc# 



BsJO-HIT 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



pL5u" 



T5T 



Locus Name 



sp:DU'r_AgUAhl 



Acc# 



066592 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



119£6.0M..±±^. 



TJZT 



77T 



TTTT 



3".8e-15 



Protein name 



Locus Name 



putatxve TonM- dependent outer membrane 
receptor 



gp:AP04ii74y 



Acc# 



AF048749 



Description 



Bacberoides iragilis capsular polysaccharide Diosyntnesis operon, complete 
sequence . 



410 



NT 



AA 



ORF Name 



NT ID 



34250912 cl 122 



1344 



, * „ — — ^, T — Score Probability 
AAID Length Length - J - 



[TUT 



Protein name 



Locus Name 



Acc# 



hypothetical protein 2 



pir:I4075S> 



Description 



ORF Name 



NT AA , , . . 
_ _ _ _ _ — _ — _ S core Probability 
NTID AAID Length Length • JL 



±l&l2...a , L.±5.1 I 



Protein name 



probable RNA polymerase sigma factor 



Description 



TW5~ 



4.fle-id 



Locus Name 



pir :T42015 



Acc# 



T42015 



NT 



AA 



ORF Name 



NTID 



±&0£21..±±..±± I 



AAID Length Length 
TOSS — 



Score Probability 

isra — 



2.7e-l4 



Protein name 



Description 



Locus Name 



gp:AB028868 



ACC# 



AB028868 



Mus musculus P4t2l)n mRNA, partial eels, 



ORF Name 



NT AA n 
"ntt 1 7\ t\ tfi t — t — ^ Score Probability 
NTID AAID Length Length £ - 



T3T7" 



TIT 



Protein name 

Description 
INO-H IT 



Locus Name 



Acc# 



411 



NT 



AA 



ORF Name 



NTID 



AAID 



407y66tt t2 66 



Length Length 



Score Probability 




0.032 



Protein name 



Locus Name 



RING linger protein 



| |gp:AF036255 



Acc# 



AF036255 



Description 



| Kattus norvegicus RING tinger protein m&SfA, complete cds. 



ORF Name 



NTID 



NT AA 

AAID Length Length Probability 



4174014 tl 7 



7W 



TUT 



6.7e-05 



Protein name 



Locus Name 



RecO 



|gp:HIUl7037 



Acc# 



U17037 



Description 



Haemophilus influenzae opacity associated proteins OapA and OapB(oapA and 
oapB) genes, complete cds, and DNA recombination andrepair protein (recO) 
gene, partial cds. 



NT 



AA 



ORF Name 



NTID AAID Length Length 
\^T2 



IT7I 1 \5TG 



Score Probability 
TT7UUTT 



Protein name 



Locus Name 



Acc# 



DNA-bmamg protein HB: DNA -binding protein 
HU: DNA- binding protein II 



Description 



pir^OOOlS 



ORF Name 



Protein name 
Description 

|No-m'i ' . ~ — 



NT AA 

NTID AAID Length Length Probability 



H3T" 



T5T 



Locus Name 



Acc# 



412 



NT 



AA 



ORF Name 



S117268 cl 121 



NTID AAID Length Length Probability 




TOT 



T7T 



Protein name 



Description 



Locus Name 



sp : YABB_ECOLI 



Acc# 



P22186 



HYPOTHETI CAL 17.4 KB PkOTEIN 1U i^'kUk-ffgL IMTEftgENKJ kEcHON (ORgC) 



ORF Name 



NT AA 

NTID AAID Length Length Probability 



byy40S7 cl 125 



1212 7 



TTT" 



2.1e-34 



Protein name 

Description 
klJNMNG PROTEIN) 



Locus Name 



sp:SP5D BACSU 



ACC# 



Q03524 



ORF Name 



NTID 



NT AA 

, , „ _ _ — ^, Score Probability 
AAID Length Length JL 



b.u.:/.Zb.b.d...cl...l2.5. I 11354 



TTS" 



1131 



9.ie-5I 



Protein name 



Description 



Locus Name 



sp : MkAY BOkBU 



Acc# 



Q44776 



(OLP-MUkNAC-PENTAPE PTluE PHOgPMoTRAttgffEkAgS) 



ORF Name 



NTID 



NT AA 
_ T — Score Probability 
AAID Length Length jL 



TT5F* 



T7u" 



5.4e-il 



Protein name 



Locus Name 



proJoaJDle ribosomal protein £>20 rpsT 



pir :G70684 



ACC# 



G70684 



Description 



413 



NT 



AA 



ORF Name 



NT ID 



67^437 ±1 36 



AAID Length Length 
^575 



Score Probability 




7.0e-13 



Protein name 



probable suitolipxa biosynthesis protein SqdA 



Locus Name 
jpir:A42380 



Acc# 



A42380 



Description 



ORF Name 



NT ID 



NT AA 
— * — Score 
AAID Length Length 



IT 



Protein name 

Description 
UsIO-HIT 



Locus Name 



Probability 



Acc# 



ORF Name 



NT ID 



NT AA n 
7v 7\ t t — 4_, t — Score Proba bility 
AAID Length Length ;■ - JL 



lM42?.ai..±l...A6. I ITJSff 



TITT 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



, , ™ T — _ — Score Probability 
AAID Length Length — — JL 

6^1 



T5T 



8.1e-08 



Protein name 



Locus Name 



potassium channel alpha subunit Kv2 .2 



|gp:XLU^0342 



Acc# 



U20342 



Description 



xenopus laevis potassium channel alpha subunit Kv2 . 2 (XShabi2)mftNA, 
complete cds . 



414 



ORF Name 



NTID 



AAID 



NT AA 
Length Length Probability 



i2bi*Jd61 c2 100 



TT57T 



6582 



STE- 



NTS" 



2TT" 



b.2e-28 



Protein name 



Locus Name 



provable protoporpJiyrmogen oxidase inemKJ 
RP847 



lpir:G71646 



Acc# 
G71646 



Description 



ORF Name 



NTID 



NT AA , 

AAID Length Length Probability 



M2b.6.dl£)...±^..J.7. I 



TIFT" 



6.ie-42 



Protein name 



Locus Name 



conserved hypothetical protein MTH700 



pirTESSTST 



Acc# 



E69193 



Description 



NT 



AA 



ORF Name 



NTID 



i45Ct&aifl..±a...:/.fe. I ircra 



T — _ — Score Pr obabi lity 
AAID Length Length JL 

I53H3 — 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



15615.43.6...C.I...15.2 1 11363 



. - T — — — ^, Score Probability 
AAID Length Length — JL 

wz&s — 



JUT 



f WTT 



S.ie-iS 



Protein name 



Locus Name 



hypothetical protein yitL 



pir :E69S40 



Acc# 



E69840 



Description 



ORF Name 



T76¥ 



Protein name 

Description 
INO-HIT 



NT 



AA 



NTID AAID Length Length Probability 

— 



7u~ 



Locus Name 



Acc# 



ORF Name 



NTID 



19953510 C2 102 



Protein name 



NT AA 

tv,™ T — , , _ — Score Prob ability 
AAID Length Length i - 



argininosuccmate lyase 



Description 



TT7T" 



Locus Name 



pir :D70419 



|4.1e-61 



Acc# 
D70419 



ORF Name 



NTID 



23.bJ.3.&b.b....Q±...&0. 



Protein name 



Description 



NT 



AA 



tvtvxt^ t ■ •> _ . — _ Score Probability 
AAID Length Length L - 



Locus Name 



Sp :RECX_PSEAE 



kkGULATORV PROTEIN RECX 



13 .9e-0{ 



Acc# 
P37860 



ORF Name 



Protein name 



Description 



LIPASE) 



NT AA , 

NTID AAID Length Length Probability 



1357 



V.iie-55 



Locus Name 



Sp:ASSY__METJA 



Acc# 
Q60174 



ORF Name 



Protein name 



NTID 



NT AA , , . , . 
, , „ T — ^ — ^. Score Probability 
AAID Length Length L 



Locus Name 



Acc# 



Description 
INO-HIT 



416 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



243S3376 t'i 6 



T3ZT 



1602 



Protein name 



Description 



Locus Name 



|gp:AB024946 



Acc# 



AB024946 



Escherichia coli plasmid pB171 DNA, complete sequence . 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length A - 



25663952 tl 2 



TT7W 



l.Se-24 



Protein name 
Description 

(EC 2.4.2.-) (MONOFUNCTIONAL TGA^E) 



Locus Name 



|sp:MT<3A_A<L 4 ICA 



Acc# 



024849 



NT 



AA 



ORF Name 



NTID 



AAID 



TT7T" 



Length Length 
"TIT 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length " L - 



1372 



riJootlavin-specitic deaminase 



Description 



1044 



1.7e-53 



Locus Name 



pir :G72207 



Acc# 



G72207 



417 



NT 



AA 



ORF Name 



NT ID 



32228388 c2 122 



TTTT 



_ _ y — T — ' Score Probabil ity 
AAID Length Length JL 





TIT 



0.001S 



Protein name 

Description 
HEXAMBftltf tftECDRSOR 



Locus Name 



sp:lWXA BLAD1 



Acc# 



Q17127 



NT 



AA 



ORF Name 



NTID 



333£7i75 c3 i3l 



TTTT" 



AAID Length Length 




Score Probability 



W5T 



Protein name 

Description 
NO-HiT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



3.fi2il44a..±2...15 1 [TT75 



Length Length 
T21T 



Score Probability 




7.5e-53 



Protein name 



Locus Name 



N-acetyl-gamma-giutamyl-phospnate reductase, 



pir :F69508 



ACC# 



F69508 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



40.0.:m0.1...c2...12i... I [IT7S 



Length Length 
2T2TT 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



418 



NT 



AA 



ORF Name 



40^0^0 « SI 



TT77 



NTID AAID Length Length 




T5T 



Score Probability 
TTZ 



1.6e-34 



Protein name 



Locus Name 



pyrrolxne- 5 - carboxylate reductase 



tap:CSAJi0739 



ACC# 



AJ010739 



Description 



Clostridium sticklandii proc gene and 5' xlanking region. 



% J 



NT 



AA 



ORF Name 



NTID 



4577005 c3 156 



AAID Length Length 



Score Probability 

— 



|4.ie-52 



Protein name 



Locus Name 



sp : E>YRfi_£ACSU 



Acc# 



P25972 



Description 

OROTATE PHOSPHORIBOSYLTRMTSFERASE, (OPRT) (OPRTASE) 



?! SS!' 
V! SS!' 



ss. 



NT 



AA 



ORF Name 



NTID 



ID 



48.0.15. 5.2...£2..A5i.. 



6501 



Length Length 



Score Probability 



1277 



Protein name 

Description 
MO-HIT- 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



m0A0.5.±..±2J±& I [TT5TT 



16602 



Length Length 



Score Probability 



FIT 



Protein name 
Description 

Mrrrr? 



Locus Name 



Acc# 



419 



NT 



AA 



ORF Name 



NTID 



4^04813 t2 16 



TTST" 



AAID Length Length 
— 



T7T 



Score Probability 
F7T5 



Protein name 



Locus Name 



sp:AftaDJ&aiiU 



Acc# 
P36839 



Description 
AdETYLOkW t THINE AMI NOTRANSFERASE , (ACOAT) 



NT 



AA 



ORF Name 



NTID 



5110712 c2 114 



fTTST 



AAID Length Length 
S^ul — 



1242 



Score Probability 
13351 



i.5e-57 



Protein name 



Locus Name 



sensory transduction histidine kinase 
slr2104 :protein slr2104 :protein slr2104 



|pir:S75i:J6 



Acc# 
S75136 



Description 



NT 



AA 



ORF Name 



NTID 



5iama...Gi...ai I [osr 



AAID Length Length 
6605 



1980 



Score Probability 
: 



0.033 



Protein name 



Locus Name 



hypotnetical protein F10M10.30 



pir :T04772 



Acc# 



T04772 



Description 



ORF Name 



NTID 



NT AA 

— , — , Score Probability 
AAID Length Length — 



S/lltitibA..±2J±l I \TJM 



l.le-21 



Protein name 



Locus Name 



argimne repressor 



gp:BSAJ10S?54 



ACC# 



AJ010954 



Description 

Bacillus stearothermophilus argR gene and partial recN gene. 



420 



NT 



AA 



ORF Name 



NTID 



AAID 



5270302 11 11 



Length Length 



Score Probability 
Tim — 



B.0e-i27 



Protein name 



Locus Name 



acetyl -CoA synthetase related protein 



bxr:F^lW 



Acc# 



F69193 



Description 



NT 



AA 



ORF Name 



NTID 



_ — T — ^ Score Probab ility 
AAID Length Length z ~ 

— 



T5T 



TTJulT 



T7T" 



1.0e-23 



Protein name 



Locus Name 



probable malate dehydrogenase, : 2 -ketoacid 
dehydrogenase : protein sll0891 : 2-ketoacid 
dehydrogenase: protein s!10891 



|pir:S75735 



Acc# 



S75735 



Description 



ORF Name 



NTID 



NT AA 
T — r — ^ Score Probabi lity 
AAID Length Length ; JL 



10.M5.3.27...±2..„10.1 1 [OFT 



6609 



7£TT 



V.Se-40 



Protein name 



Locus Name 



115K outer membrane protein precursor : SusC 
protein 



pxr : JC6027 



Acc# 



JC6027 



Description 



ORF Name 



NT AA 

_ _ _ — ^. — ^ Score Probability 
NTID AAID Length Length — JL 



lD.5.S.D.0.5.2..±3....iai ..J [TIM 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



10662877 Cl 202 



11389 



][ 



AAID Length Length 
555 



55TT 



Score Probability 
i.6e-18 



Protein name 



Locus Name 



putative transposase 



gp:AF00742y 



Acc# 



AF007429 



Description 



Haemopnilus paragallmarum is-iike putative transposase gene, complete cds. 



ORF Name 



10725542 c3 342 



Protein name 



NTID 



NT AA 

— ■ , — , Score Probability 
AAID Length Length 



5517" 



5TT 



T2T 



Locus Name 



Acc# 



Description 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



5"5TT 



Length Length 
TUT 



Score Probability 
T7I5 " 



tf .5e-i3 



Locus Name 



sp:MTeA_HAEM 



Acc# 



P44890 



(EC 2.4.2.-) (MOflOJj ' ONC T IONAL TOASE) 



ORF Name 



NTID 



AAID 



NT AA 

— — , Score Probability 
Length Length 



110.i5.uaa...c2...3.D.6.... ...J 



55TT - 



T51¥" 



Protein name 



Locus Name 



mobilization protein B 



gp:A^'ii8242 



Acc# 



AF118242 



Description 

Bacteroides tragilis mobilization protein B [moBEl gene, compietecds . 



ORF Name 



Protein name 



NT ID 



AAID 



1393 



TOTS- 



NT 



AA 



Length Length 
TFT 



Score Probability 



Locus Name 



Acc# 



Description 



[NO -HIT 



ORF Name 



!liaaii.li....cl..J.4y... 



NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


1394 


6616 




867 




113 


0.00062 



Protein name 



Locus Name 



transmembrane sensor 



|gp:AF06Oly3 



Acc# 



AF060193 



Description 



£>seudomonas aeruginosa pigACDE operon, complete sequence /hypothetical PigB 
(pigB) gene, complete cds . 



ru 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



i2i&as&L.c£...aia 



1335 



148 



S.le-ll 



Protein name 



Locus Name 



collagen-liKe protein 



gp:BTU67Wl 



Acc# 



U67921 



Description 



Bacillus thuringiensis plasmid pTXi4-i, MOB, RKP, ana collagen- likeprotein 
genes, complete sequence. 



ORF Name 



NT AA 

— — , Score Probability 
NTID AAID Length Length ■ 



6618 



Protein name 



Locus Name 



Acc# 



Description 



MO-HIT 



423 



ORF Name 



NTID 



AAID 



1J0 7 1^4J ci 216 



TT5T 



6619 



Protein name 



conserved hypotneticai protein 



Description 



NT AA „ , i . 
— , — , Score Probability 
Length Length ^ 



1401™ 



Locus Name 



fpTrTWTZTTr 



V.be-37 



Acc# 



H72331 



NT 



AA 



ORF Name 



NTID 



AAID 



imO.&O.O....Gl...2.0..7. I [OM 



6620 



Length Length 



Score Probability 



Protein name 

Description 
MO-HIT 



Locus Name 



Acc# 



% si 
ii rl 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1399 



6621 



1206 



l . ae-09 



Protein name 



Locus Name 



transposase 



gp:AF038866 



Acc# 



AF038866 



Description 



Bacteroides tragi lis transposon Tn552 0 transposase (bipH) andmobilization 
protein BmpH (bmpH) genes, complete cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



\±&l6M.9±...z±Jl±l I [TTuTT 



6622 



Length Length 



Score Probability 
|7.2e-16 



Protein name 



Locus Name 



RNA polymerase sigma factor SigZ-like protein 



tap :AFi37263 



ACC# 



AF137263 



Description 



Bacteroides thetaiotaomicron 3 OS ribosomal protein S16-likeprotein, fucose 
gene cluster, and RNA polymerase sigma f actorSigZ-like protein (sigZ) genes, 
complete cds . 



424 



NT 



AA 



ORF Name 



NTID 



AAID 



145S9067 £3 150 



Length Length 
S3 - 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



|14££!552..±2...7„7. I |T¥TJ^ 



0.00021 



Protein name 
Description 

'THERMOREGULATORY PROTEIN LCRF 



Locus Name 



sp : LCJRy__YERyk 



Acc# 



P28808 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



0.025 



Protein name 



Locus Name 



hypothetical protexn aq_2087 



bir:£(7047& 



Acc# 



H70478 



Description 



ORF Name 



14S.7.5.3.0.2...cl...26.7.., 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
TUT 



Score Probability 



Locus Name 



Acc# 



Description 



NT 



AA 



ORF Name 



NT ID 



15659758 il 51 



AAID Length Length 



Score Probability 



6627 



est 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID AAID Length Length 

-gzzz — 



Score Probability 
Jf .9.e-±2 



T3T 



Protein name 



Locus Name 



transposase 



gp:AF0S8866 



Acc# 



AF038866 



Description 



Bacteroicles tragilis transposon Tn5520 transposase (bipH) andmoo liization 
protein BmpH (bmpH) genes, complete cds . 



ORF Name 



NTID 



NT AA 

— , — Score Probability 
AAID Length Length JL 



1407 



1572 



I2.2e-211 



Protein name 



Locus Name 



: sp:TkA2_RACFR 



Acc# 



Q45119 



Description 

TRANSPOSASE FOR INSERTION SEQUENCE ELEMENT IS21-LIKE 



NT 



AA 



ORF Name 



NTID 



AAID 



11408 



Length Length 



11437 



Score Probability 
J2F8 



l2.2e-44 



Protein name 



Locus Name 



Acc# 



sp:PP0X MYXXA 



P56601 



Description 

&ROT0P0ft&HYRIM0(3fiM OXIDASE, (E>]?0) 



ORF Name 



1649i'5$3 c2 279 



Protein name 



NT ID 



AAID 



G621 



NT AA o 

— — , Score Probability 
Length Length 



T57" 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NT ID 



1410 



AAID 



NT 



AA 



Length Length 

pet — : 



Score Probability 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



iaaa.7.&a.7...±2L...in.. 



Protein name 



NTID 



1411 



AAID 



NT 



AA 



Length Length 
T&9 



Score Probability 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



1412 



hypotnetical 26. 8K protein 



Description 



NT 



AA 



Length Length 
FT™ 



pur 



Score Probability 
0.00024 



Locus Name 



Acc# 



JC2322 



ORF Name 



NTID 



aifiamflL.c3L.3i5& .....| p^tt 



Protein name 



AAID 



NT AA 

— — Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



122459687 c3 347 



Length Length 
[TFT 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



[NO -HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



RTF 



hub" 



|2.7e-06 



Protein name 



Locus Name 



immunoreactive 53 JcD antigen PG123 



gp:AF144641 



ACC# 



AF144641 



Description 



Porpnyromonas gingival is strain W50 immunoreactive 53 kD antigenPG123 gene, 
complete cds. 



NT 



AA 



ORF Name 



NTID 



AAID 



\226.&2£>A2...q±..211 I [14TF 



Length Length 



Score Probability 
MB 



l6.1e-42 



Protein name 
Description 

PUTATIVE HEAT ^HotJK PRO T E I N HTPX 



Locus Name 



sp : HTPX_STR<aC 



Acc# 



03 0795 



NT 



AA 



ORF Name 



NTID 



\228.2A0A1..±2...B.2 



T2TT 



AAID Length Length 
\TTT~ 



Score Probability 
0.00-018 



Protein name 



Locus Name 



MbpB 



bp:BPUi457I6 



Acc# 



U25716 



Description 



Bacteroicles tragi J. is mobilization protein M&pA (mJopA) , MbpB tmopB) and MJopc 
(mbpC) genes, complete cds. 



428 



ORF Name 



NT ID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



Protein name 



T1TF" 



TIT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



T%TT 



AAID 



15541 



NT 



AA 



Length Length 



Score Probability 



TIT 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



NTID 



AAID 



NT AA 

— — , Score Probability 
Length Length 



Protein name 



flUT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



NTID 



AAID 



NT AA ^ _ . . . . 
— — , Score Probability 
Length Length 



Protein name 



6643 



2T6" 



Locus Name 



Acc# 



Description 



INO-HIT 



NT 



AA 



ORF Name 



NT ID 



AAID 



16644 



Length Length 




Score Probability 
2 . ie-06 



141 



Protein name 



Locus Name 



immunoreactive 53 kD antigen PG123 



gp:AF144641 



Acc# 
AF144641 



Description 



immunoreactive 53 KD antigenPGl23 gene, 



Porpnyromonas gmgivalis strain W50 
complete cds . 



NT 



AA 



ORF Name 



NT ID 



AAID 



23£735l0 c3 34b 



Length Length 



Score Probability 
|2.4e-3l 



Protein name 



Locus Name 



putative acetyitransf erase 



gp : SCF1 



Acc# 



■ALII 73 2 2 



Description 

Streptomyces coelicolor cosmid Fi . 



NT 



AA 



ORF Name 



NTID 



AAID 



240.26.M.2..±i...z:/. 



Length Length 

2F7~ — 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



ii2Laata:/.„.ci...ai6 



6647 



\7UTT 



3 .le-37 



Protein name 



Locus Name 



Acc# 



unknown 



| gp:AF079317 



AF079317 



Description 

Sphingomonas aromaticivorans plasmia pNLl, complete piasmidsequence . 



430 



ORF Name 



l 24112frli> ci 197 



Protein name 



NT ID 



T52T 



NT 



AA 



AAID Length Length 
— 



Score Probability 



1ST 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



1427 



AAID Length Length 
271 1 F9B 



Score Probability 
S.0e-40 



Locus Name 



immunogenic 23 KDa lipoprotein PG3 



|gp:AF1457^ 



Acc# 



AF145799 



Description 



Porpnyromonas gingival is strain W50 immunogenic 23 KDa lipoprotemPG3 gene, 
complete cds . 



ORF Name 



Protein name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length 



T¥2T 



TT7T 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



vrlE protein 



Description 



NTID 



AAID 



NT AA ■ _ , . . 
— — , Score Probability 
Length Length — 



TIT 



9 .5e-08 



Locus Name 



pir :TI7384 



ACC# 



T17384 



431 



NT 



AA 



ORF Name 



NTID 



24542137 tl b'A 



T3Tu~ 



AAID Length Length 

Tim — 



TUT 



Score Probability 
2 . ye-n 



TFZ 



Protein name 



Locus Name 



Acc# 



putative outer membrane porxn 



|gp:AF030977 



Description 

Vibrio cholerae glutamyl tftNA synthetase igitx) gene, partiai cas /putative 
outer membrane porin (ompA) , unknown protein, vibriobactinreceptor precursor 
(viuA) , and ViuB protein (viuB) genes, completecds; and VibF (vibF) gene, 
partial cds . 





ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


246422l2__r2_li0 


1431 


6653 


301 


$06 


631 





Protein name 



Locus Name 



| sp:YBFIi^AC^U 



Acc# 



031448 



Description 

HYPO T HETICAL 33. B Kb PROTEIN IN GLPT-PURT mTUkc^iC k^ION 



ORF Name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length 



\1&6A1B.2&...±±J1& ...J p£TZ 



77" 



Protein name 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length 



\lA115MA...a±...2±$. I 



TTIT 



^T 



Locus Name 



Acc# 



mobilization protein a 



gp:AF11^241 



AF118241 



Description 

Bacteroides rragilis mobilization protein A tmobA) gene, compieteccts . 



432 



NT 



AA 



ORF Name 



NTID 



AAID 



24726592 c3 355 



1434 



Length Length 

— 



T5T 



Score Probability 
l.Oe-30 



Protein name 



Description 



Locus Name 



sp:MTGA__ECOLI 



Acc# 



P46022 



(BG 2.4.2.-) (MONOFONCTlONAb TtiAaU) 



ORF Name 



NTID 



I2480S426 ci 2l0 



Protein name 



hypothetical protein MTH847 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



T2TT 



Locus Name 



pir :A692i3 



Acc# 



A69213 



31 



ORF Name 



NTID 



AAID 



NT AA 

— ' , — , Score Probability 
Length Length — 



24M.7.5.Sl...c2...I0.ii 



Protein name 



Description 



INfO-HiT 



[273" 



Locus Name 



Acc# 



ORF Name 



25.1B.27.7...,a^...2B.2... 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



1437 



[TT75~ 



1.7e-07 



Locus Name 



hypothetical protein ydaT 



|pir:C«770 



Acc# 



C69770 



Description 



433 



ORF Name 



25511052 ci '±±2 



Protein name 
LemA 



Description 



NTID 



NT 



AA 



AAID Length Length 
ZTL 



2uT" 



Score Probability 
2.8e-57 



Locus Name 



gp:LMU66ia<> 



Acc# 



U66186 



Listeria monocyto genes LemA (lemA) gene, complete cds, ana LemB(iemB) gene, 
partial cds. 



ORF Name 



125527053 ri 22 



Protein name 



NTID 



T¥73 1 I^T 



NT 



AAID Length Length 
TJTT 



AA 

— Score Probability 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



NTID 



iafii£45i&-.±£...a& I (144^ 



AAID 



NT 



AA 



Length Length 
222 



Score Probability 



73 



Locus Name 



Acc# 



Description 



NO-HIT 



1 1 w / 



ORF Name 



Protexn name 



NTID 



AAID 



1441 



hypothetical protexn au^tfuy . j 



Description 



— — Score P robability 
Length Length ■ 



0.00031 



Locus Name 



Acc# 



T33369 



434 



ORF Name 



Protein name 



NTID 



AAID 



— — Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 




Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



1444 



AAID 



6666 



NT 



AA 



Length Length 
TFS 



Score Probability 



FT 



Locus Name 



Acc# 



Description 



MO -HIT 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


3.2213.3.12..±l...lb.i 


1445 


6667 


171 


516 


237 


6.8e-20 



Protein name 



Locus Name 



putative ECF sigma tactor Rpoux 



|gp:AP04yluV 



Acc# 



AF049107 



Description 



Myxococcus xanthu s response regulator FrzZ (trzzj gene, partiaicas; alanine 
dehydrogenase (aldA) , putative ECF sigma factor RpoEl (rpoEl) , and response 
regulator homolog (frzS) genes, complete cds ; and unknown genes. 



435 



ORF Name 



Protein name 



NT ID 



16668 



NT 



AA 



AAID Length Length 
7TT 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



NT AA 

— — , Score Probability 
AAID Length Length — 



B.2e-7S 



Locus Name 



sp:HEMNJVGUAli! 



Acc# 



067886 



Description 

0XY<3BM-1J«)B&BMEDBMT C0?R0?0RPHYRIN0(3fiN 11 



ORF Name 



Protein name 



NTID 



1448 



NT AA 

— — , Score Probability 
AAID Length Length 



YT" 



2HT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— , — Score Probability 
Length Length 



1445 



TIT" 



6 . 9e-06 



Locus Name 



probable carboxy- terminal proteinase, Dl 



pir :T05975 



Acc# 



T05975 



Description 



NT 



AA 



ORF Name 



NTID 



25202 tl b2 



AAID Length Length 



— Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



(NO-HIT 



ORF Name 


NT 

NTID AAID Length 


AA 

— , Score 
Length 


Probability 




1451 6613 3 88 1167 8b 






Protein name 




Locus Name 




Acc# 


xntegrase 


i 


gp:HIVUb^3 




U69223 


Description 










HIV-i strain CM&273 rrom Cameroon integrase ipol) gene, paruiaicas. | 


ORF Name 


NT 

NTID AAID Length 


AA 

— , Score 
Length 


Probability 




6"6"74 291 876 111 




to . ^te-uo 


Protein name 


Locus Name 




Acc# 
PC4110 


transcription 


regulator nomoiog : nypotneticai 


pir :PC411U 


137 protein 










Description 










ORF Name 


NT 

NTID AAID Length 


AA 

— , Score 
Length 


Probability 




Tttl 6675 41b 1248 288 




5 . ue- z<± 



Protein name 



Locus Name 



nypotneticai protein 



|gp;AFi49&bl 



Acc# 
AF149851 



Description 

Pseudomonas sp. KcJ hypothetica l proteins, metnaiiotnionein-iiKeprotein, 
MoeB-like protein, putative proteins, hypotheticalprotein, putative 
oxidoreductase, and putative AMP ligase (entE) genes, complete cds; and 
putative receptor gene, partial cds. 



437 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



[4720187 12 99" 



[51T7F" 



7.1e-96 



Protein name 



Description 



Locus Name 



spiISTbjAAcJFk 



Acc# 
Q45120 



IttSBft'l'loM SEQUENT 1^21-LlKE MUT ATIVE ATE-MNDItKi ^kol'ElM 



NT 



AA 



ORF Name 



NTID 



4S22751 12 101 



AAID Length Length 




^77 



^4" 



Score Probability 
1.8e-48 



T777 



Protein name 



Locus Name 



oxaloacetate decarboxylase, subunit alpna 
(oadA) homolog 



pir:C69406 



Acc# 



C69406 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NfO-HIT 



ORF Name 



NTID 



— — Score Probability 



AAID Length Length 



TIT 



TUT 



4 . 6e-l4 



Protein name 



Locus Name 



collagen 



Acc# 



AB008933 



Description 

Hydra vulgaris HT2 mRMA tor collagen, partial cds . 



438 



NT 



AA 



ORF Name 



NT ID 



AAID 



5177157 ±2 3& 



1458 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



5.2&£A<*2...a2....2.ai ...J [T^SS 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



5a2mi..±i.„aaL I [i^tt 



Length Length 



Score Probability 



7uT~ 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



NTID 



AAID 



NT AA 

- — — , Score Probability 
Length Length ~^ L ~ 



6683 



TIT 



4.8e»15 



Protein name 



hypothetical protein 



Locus Name 
|pir:B72308" 



ACC# 



B72308 



Description 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID 



Length Length 
PI 



Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



£025010 c3 333 



Protein name 



Description 



NT 



AA 



NT ID 



AAID Length Length 

tzs — 



Score Probability 



7T 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



&M6.M7...±1...3.b>... 



Protein name 



Description 



NTID 



AAID 



NT AA 

— — , Score Probability 
Length Length 



TWTT 



Locus Name 



Acc# 



NO-HIT 



ORF Name 



Protexn name 



Description 



NT 



AA 



NTID AAID Length Length 
TOTT7 



Score Probability 



T7F" 



Locus Name 



Acc# 



MO -HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



1752 



Score Probability 
|2.7e-i>0 



Locus Name 



'sp .-FEOAJIOOLl 



Acc# 



P13036 



Description 

I RON (111) DIOI T RM ' IiI TRANSPORT PROTEIN F ECA PRECURSOR 



440 



ORF Name 



NTID 



Protein name 



glycine-ricn protein (clone wlO-lJ 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



tpir:^14982 



|2.8e-05 



Acc# 



S14982 



ORF Name 



Protein name 



NTID 



NT 



AA. 



AAID Length Length 



— Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



"KTT AA 

— — Score P robability 
AAID Length Length 



TJD" 



|3.5e-05 



Locus Name 



membrane glycoprotein 



|gp:D887ii 



ACC# 



D88733 



Description 

Equine herpesvirus 1 Dna tor membrane glycoprotein, complete ccts . 



ORF Name 



NTID 



NT AA 

— — Score Probability 
AAID Length Length 



TT7TT 



Protein name 



Locus Name 



Acc# 



Description 



BsfO-Hl' l 1 - 



441 



NT 



AA 



ORF Name 



NT ID 



TTTT 



AAID Length Length 
fettl 



Score Probability 
I.3e-20 



Protein name 



Locus Name 



immunoreactive 53 kd antigen PG123 



gp:AF144641 



ACC# 



AF144641 



Description 



Porphyromonas gingivaiis strain W50 immunoreactive 53 kD antigenPGl23 gene, 
complete cds . 



ORF Name 



NT AA n ^ , i_ • t * j_ 
— , — , Score Probability 
NTID AAID Length Length JL 



807033 cl 25 



TTTT 



TTT 



Protein name 



Description 



Locus Name 



Acc# 



(NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



±±0£.11&!...C2..A0. I IITTJ 



Length Length 



Score Probability 
l. le-42 



Protein name 

Description 
HYPOTHETICAL PROTEIN HI1523 



Locus Name 



sp :YF2 3_HAEIN 



ACC# 



P44243 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NT ID 



14650012 2b 



AAID Length Length 

[os — 



Score Probability 




Protein name 



glucosidase II beta-subunit 



Locus Name 
gpiAt'ObbObl 



Acc# 



AF066061 



Description 



Mus musculus gluc osidase 11 beta- summit gene, aicernativeiyspii 
products, partial cds . 



NT 



AA 



ORF Name 



NTID 



158351*61 Cl 28 



TT7F- 



AA1D Length Length 
T5T 



— Score Probability 



TT5" 



Protein name 



Locus Name 



Acc# 



Description 
INO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID 



IT7T 



Length Length 
TTU1 — 



Score Probability 
|6.Se-43 



PT5T" 



Protein name 



Description 



Locus Name 



Acc# 



P54965 



HYDROLASE) (C!BAH) (i^ILE riALT H^ DROLA^l!) 



NT 



AA 



ORF Name 



NTID 



T¥75~ 



AAID Length Length 
F7ul5 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 
NO -HIT 



443 



ORF Name 



1247^^5401 tl 2 



Protein name 

Description 
NO-HIT 



NT ID 



AAID 



NT AA 

— , — , Score Probability 
Length Length i - 



TTTT 



6701 



Locus Name 



Acc# 



ORF Name 



Protein name 

Description 
£0J PfcOTSIM 



NT 



AA 



NTID 



AAID 



10.1S.M.0.S...±1JX± I 



Length Length 
TTT" 



Score Probability 
4. Oe-38 



Locus Name 



Isp : SOJJBACSTJ 



Acc# 



P37522 



ORF Name 



iii&am:L±a.„2L5i.. 



Protein name 



NTID 



AAID 



Hypothetical protein F20Dio.23U 



Description 



NT 



AA 



Length Length 



Score Probability 
5T5 : 



0.024 



Locus Name 



pir :T05638 



Acc# 



T05638 



NT 



AA 



ORF Name 



fSSL 



NTID AAID Length Length 

— 



Score Probability 
I2.8e-10 — 



Protein name 



Locus Name 



enclo-xylanase homolog PCZA361.14 



pir :T17480 



Acc# 



T17480 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



|S.Mai6:.7...±i....7. I pun - 



Length Length 
TT2 1 



Score Probability 



71 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



444 



NT 



AA 



ORF Name 



NT ID 



'1484 



AAID Length Length 
WIT 



6106 



Score Probability 
0.0024 



Protein name 



Locus Name 



outer membrane protein 



gp : BNROMPB 



ACC# 



L77614 



Description 



Bacteroides thetaiotaomicron outer membrane protexn ^susD) gene , complete 
cds . 



ORF Name 



NT ID 



AAID 



NT AA 

- — , — • Score Probability 
Length Length • ^ 



10547256 t2 6 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



Ul! 



ru 



NT 



AA 



ORF Name 



NT ID 



AAID 



Iim&as2...c2.„i4 ...J 



F7W 



Length Length 
|¥53 



Score Probability 
7,5e»05 



Protein name 



Locus Name 



hypotnetical protein aq_l0l8 



tpir:H70387 



Acc# 



H70387 



Description 



if s 

'3? llSf 



ORF Name 



NT ID 



AAID 



2.11.7.6Al..±l...l.. 



6709 



Protein name 



NT 



AA 



Length Length 



— , Score Probability 



"2250 



Locus Name 



Acc# 



Description 



INO-HIT 



445 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



148§" 



ATI 



Protein name 



Locus Name 



surtace exclusion protein sepl precursor 



Description 



jpTrT377T71T 



TT7U7T 



Acc# 



S72375 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 




Score Probability 



Locus Name 



Acc# 



Description 



[NO-HIT 



ORF Name 



Protein name 



f ructanase 



Description 



NTID 



1490 



6712 



— — Score Probability 



AAID Length Length 



TT7W 



Locus Name 



lpir:A3bylb 



8 . 3e-I41 



Acc# 



A36915 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



TTZT 



5..5e-23l 



Locus Name 
gptlaMRSCftL 



Acc# 



M83774 



Bacteroides Iragilis levanase (scrLj gene, complete cas . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 




Score Probability 



TIT 



Protein name 



Locus Name 



Acc# 



Description 



MO -HIT 



446 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



Protein name 



Locus Name 



renin-bmcling protein-related protein : protein 
slrl975 :protein slrl975 



Description 



lpir:yVb64y 



1.5e-68 



Acc# 



S75649 



ORF Name 



26.&19.b.b.b...±i....4.... 



Protein name 



NT ID 



NT 



AA 



AAID Length Length 
TT1 



Score Probability 



7T 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



Protein name 



NTID 



5717 



nexuronate transporter homolog yjmG 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



417 



Locus Name 



pir :Aby«bJ 



8.7e-23 



Acc# 



A69853 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



l . 4e-ov 



Locus Name 



N-acetylneurammate lyase 



gp : CURIAM 



Acc# 



Y12876 



Description 

C.perlringens gene encoding N-acetymeurammate lyase ana twopartiai open 
reading frames. 



447 



NT 



AA 



ORF Name 



NT ID 



5117^7 ±2 i 



AAID Length Length 

w& — 



Score Probability 
4.4e-l6 



Protein name 

Description 
HYPOTHETICAL PROTEI N HI 022 7 



Locus Name 



Acc# 



P44583 



NT 



AA 



ORF Name 



NTID 



781^2 r3 V 



1498 



AAID Length Length 





6720 



Score Probability 
3.1e-53 



Protein name 



Locus Name 



115K outer membrane protein precursor : susc 
protein 



pir:JC60^7 



Acc# 
JC6027 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
71 - 



Score Probability 



TTT 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



NTID 



— — Score Probability 



AAID Length Length 



T5W 



TIT 



i.5e-6i 



Protein name 



Locus Name 



metabolite transporter homolog ytnA 



pir :D6yai4 



Acc# 



D69814 



Description 



448 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



25433212 rl 2 



TTTZT 



TTT" 



T5T 



5.1e-14 



Protein name 



Locus Name 



alpha-M-acetylglucosaminiaase 



] | gp:M'l'Aia2Pg" 



Acc# 



Y18209 



Description 



"ITIcotiana tabacum mkMA tor alpna-N-acetyigiucosaminiaase . 




NT 

^PF Nam^ NTID AAID Length 


AA 
Length 


Score 


Probability 


c2 26 ^724 202 6 




327 


2.5e-29 


Protein name 


Locus Name 


Acc# 


" probable cationic amino acia transporter 


pir:T34694 


T34694 


Description 








n PF K^ m ^ NTID AAID Length 


AA 
Length 


Score 


Probability 


mrncsccriia ibu3 ^725 432 1299 


195 


1.9e-i2 



Protein name 

immunoreactive i32KD antigen PG4J. 



Locus Name 



] |gp:AP17b71 T" 



Acc# 



AF175716 



Description 

Porphyromonas g ingivalis strain WbU immunoreactive b^KU an 
complete cds . 



tigenPG4l gene, 



NT 



ORF Name 



NTID 



AAID Length Length 



— Score Probability 



1G£l644^£1.»:l 



¥4lT 



|2.9e-bO 



Protein name 



Locus Name 



Acc# 



sp:ANAO_HUMAisi 



P54802 



Description 
gLUCOSAMlNIMijli!) (NA<j) 



ORF Name 



NT ID 



— — S core Probabi 1 i ty 
AAID Length Length 



TETTB" 



ZTZT 



TUT 



TTT 



Protein name 



Locus Name 



60kDa protein 



gp:AU004bbO 



Acc# 



AB004560 



Description 

Porphyromonas gingivalis MA tor SOKDa protein, complete cas . 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 
2T5 



Score Probability 



IT 



Locus Name 



Acc# 



Description 
WO- HIT' 



ry 



ORF Name 



NTID 



— — Score Probability 
AAID Length Length 



13.&&$.0.fi&...c2....:/.B. 



i.6e-10S 



Protein name 



Locus Name 



115K outer membrane protein precursor : susu 
protein 



|pir:J l C602V 



Acc# 



JC6027 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
TU1 — 



Score Probability 



ST 



Protein name 



Locus Name 



Acc# 



Description 
MO-HIT 



450 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


14'64780BJ:^_20 


- 1509 


6731 


287 


864 


126 


4 . 3e-06 


Protein name 










Locus Name 


Acc# 












sp:YDIP_E<JoLl 


P77402 


Description 
















-HYtOTHETICAL 'I'kAMSCRI&TloNAL 


REGULATOR 


IN MOD 


-PPS INTJiKtiJbjJN 1 <J 


RfiSIOM | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


14660S42_cl_74 


1510 


6732 


207 








Protein name 










Locus Name 


Acc# 


Description 
















MO-Hl'i' | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


l£g.3.MA2L...il...lS. 


1511 






291 


7 8 


0.064$ 


Protein name 


Locus Name 


Acc# 


hypothetical prot 


em c040Ub 








pir:S7SJ72 


S75372 


Description 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


2.0.0.0..7.B.12...cl...i0.2. 


1512 


6734 


467 


X404 




1.7e-l2 


Protein name 


Locus Name: 


Acc# 


transposase 


gp:AF038a66. 


AF038866 



Description 



Bacteroides tragilis tran sposon 'l'nbb2U transposase tftipHj andmormxzatxon 
protein BmpH (bmpH) genes , complete cds . 



451 



ORF Name 



22683287 t3 56 



Protein name 



NTID 



T3TT 



"NTT A A 

— — Score Probability 
AAID Length Length 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT AA 

— — Score P robability 
Length Length 



1025" 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID 



Length Length 
£"5 



Score Probability 
0.029 ~ 



FT7 



Locus Name 



hypothetical protein tubv.^ 



pir :T248^e> 



Acc# 



T24826 



Description 



ORF Name 



NTID 



AAID 



2£.7.7.1M.i...ai...!al4.. 



Protein name 



hypothetical protein C33G8.^ 



Description 



NT 



AA 



Length Length 
TBI 



Score Probability 
F77e=T5 



ffTT 



Locus Name 



pir:T34i37 



Acc# 



T34137 



ORF Name 



Protein name 



NTID 



AAID 



11111&±X.±1..A1 



NT 



AA 



Length Length 
TTO 



Score Probability 



TUT 



Locus Name 



Acc# 



Description 



NO-HIT 



NT 



AA 



ORF Name 



NT ID 



34510418 ci 61 



AAID Length Length 
TSu" 



Score Probability 
II. oe-39 



¥2¥ 



Protein name 



Locus Name 



Hypothetical protein F36HI^.^ 



lpir:T3^4b7 



Acc# 



T33457 



Description 



ORF Name 



NT ID 



AAID 



NT AA „ , , . n . , 
— — Score Pro bability 
Length Length — 



1519 



15741 



7 . ye-13 



Protein name 



Locus Name 



unknown 



tap;US677i 



Acc# 



U96771 



Description 



Prevotella bryantu putative polygalacturonase , B-l , 4 -enaogiucanase , ana 
mannanase genes, complete cds; and unknowngenes . 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



16±9A93.B....alJ16. 



1520 



16742 



T7TT 



|2.4e-40 



Protein name 



Locus Name 



hypothetical protein C33G8.2 



pir:TJ4l37 



Acc# 



T34137 



Description 



ORF Name 



Protein name 



NT ID 



1521 



NT 



AA 



AAID Length Length 



Score Probability 



1253 



5£T 



Locus Name 



lsp:YBDNJi!00Ll 



3.1e-by 



Acc# 



P77216 



Description 

HYPOTHSTlcJAL 47. a Kb PROTEIN IN CSTA-UdiKi IMU'feikCil^lcJ ki^luN 



ORF Name 



397175 c2 77 



Protein name 



NT ID 



AAID 



6744 



— — Score Probability 
Length Length 



T7S" 



Locus Name 



Acc# 



Description 



MO -HIT" 



ORF Name 



4G£M.0A...cJ....lul.. 



Protein name 



NT ID 



— — Score Probability 
AAID Length Length 



12358 



i.3e-oa 



Locus Name 



Acc# 



P46360 



Description 

PBSTICIN REGBETok PRECflftgOk (IRPC) (iPR6b; 



ORF Name 



Protein name 



NT ID 



NT 



AA 



AAID Length Length 
[¥T7 



Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



S.3.0.D.3..7.7....al...6.6. 



Protein name 



NT 



AA 



NT ID 



AAID Length Length 



Score Probability 



TTT 



Locus Name 



sp:Ym)M_>!<JoLl 



Acc# 



P77174 



Description 

HYPO T HETICAL 21 .3 KB kkoTElN IN (Jtj ' l ' A-briB ti IN'l'EkcJJjHIC kUcjluU 



454 



ORF Name 



NT ID 



KTT AA 

— — Score Probability 
AAID Length Length 



92" 



Protein name 



Description 



Locus Name 



Acc# 



KTO-HIT 



NT 



AA 



ORF Name 



NT ID 



AAID 



l ama^tA^aa q prr 

Protein name 



Length Length 
1506 



Score Probability 
1.5e-45 



WT5 



Locus Name 



immunoreactive biKD antigen F<JSi 



|gp:AF1 7 S 7 l9 



Acc# 



AF175719 



Description 



Porphyromonas g ingivalis strain W50 immunoreactive 5IKD antigenPG52 gene, 
complete cds . 



NT 



ORF Name 



NTID AAID Length Length 



AA 

— Score Probability 



F75TT 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



NTID 



mOA:/.lB...±1...2. 



NT Score Probability 

3 . le-13 



AAID Length Length 



TIF 



Protein name 



Locus Name 



Acc# 



P77989 



Description 
BETA- (JALACTOa IDA^E , ( LACTAM ) 



455 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



13704552 cl l2y 



WW 



1491 



1403 



i.9e-143 



Protein name 



Locus Name 



sp:6PGD_TWi!]JA 



Acc# 



083351 



Description 



NT 



AA 



ORF Name 



NTID 



1375S530 c3 190 



HT3T* 



AAID Length Length 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 


NTID AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


13.S.D..7.ilZ...al...ifci6. 


"" 1532 6754 


74 


225 


77 


0.0095 



Protein name 



Locus Name 



putative signal transduction protein GarA 



|gp:Al?l73844 



Acc# 



AF173844 



Description 

Mycobacterium smegmatis garA- containing gene cluster, partiaisequence , 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


ll^&llhl^alJLli - 


1533 




383 1152 


350 


7.2e-32 



Protein name 



Locus Name 



cytochrome d oxidase summit II 



| gp:AFu01Su3 



Acc# 



AF001503 



Description 



Salmonella typhim urium cytochrome d oxidase summit I IcydA) anacytochrome 
d oxidase subunit II (cydB) genes, complete cds. 



456 









NT 


AA 


Score 


Probability 


ORF Name 


NT ID 


AAID 


Length 


Length 








14446r/_cl_132 


1534 


6756 


62 


189 


58 






Protein name 








LOCUS 


Name 




Acc# 



ribosomal protein 



] |gp:Tiayi4b' 



U87145 



Description 

Toxoplasma gondii chloroplast, complete genome. 



ORF Name NT ID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


1644l305J:3__l03 1535 


6757 | 




711 244 


1.2e-20 


Protein name 






Locus Name 


Acc# 


1 hypothetical protein O'zsvi 






~| pir:&6b0l2 


B65012 


Description 


ORF Name NT ID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


I6.5.m5.£...±3....1D.l 1536 




450 


1353 7T7 


3.2e-71 


Protein name 






Locus Name 


Acc# 


1 hypothetical protein 






~~| pir:S7by4b 


S76946 


Description 


ORF Name NTID 


AAID 


NT 
Length 


— , Score 
Length 


Probability 


IM12.a...al...li.O. lbT7 


675y 


84 


255 63 


| 0.007B 



Protein name 



Locus Name 



Acc# 



AJ000258 



Description 

Homo sapiens trinucleotide repeat t 



-d(CGG)n-3ds binding proteinp^u-uuutfF . 



ORF Name 



NT ID 



NT AA 

— Score 
AAID Length Length 



196878.1b ti 87 



1290 



Probability 
|6.4e-$5 



Protein name 



Description 



Locus Name 



sp:YCAJ_HAli!lN 



Acc# 



P45262 



NT 



AA 



ORF Name 



NT ID 



12068766 cl 143 



AAID Length Length 
I575T 1 I 11578 



Score 



Probability 
3.Se-lll 



Protein name 



Locus Name 



|sp:OTDA_A^O\/i 



Acc# 



Q09049 



Description 

CYTOCHROME D UMlgUlNOL uXlbAS B aUHUMlT i, 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 
~ 



Score Probability 



TIT" 



Protein name 



Description 



Locus Name 



Acc# 



NO -HIT 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 




11341 



Score Probability 
12 .6e-JJi 



[STT 



Protein name 



Locus Name 



■RumBlM&i) 



|gp:XXU13U3 



Acc# 



U13633 



Description 

IncJ plasmid R331 rumA(R39l) and rumB(R39l) genes, complete cas . 



458 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



6764 



fTJT 



1.7e-23 



Protein name 



urea transport protein 



Locus Name 
gprAt'lbVbVV 



Acc# 



AF167577 



Description 



Actinobacillus pleuropne umoniae transcriptional regulator lapuKjgene, 
partial cds; and putative periplasmic binding protein (cbiK) , putative 
cytoplasmic membrane protein (cbiL) , cobalt membranetransport protein 
homolog (cbiM) , cobalt membrane transport proteinhomolog (cbiQ) , cobalt 
transport ATP-bindincr protein homoloa (cbiO) , and urea transport protein 



\3 



ORF Name 



NTID 



Protein name 



molybdate metabolism regulator 



Description 



— — score Probability 



AAID Length Length 
1080 



Locus Name 



hpir:B64979 



2.2e-I-2 



Acc# 



B64979 



y 



ORF Name 



Protein name 



NT IP 



AfeC transporter, ATP -Binding protein 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TTT 



Locus Name 



lpir:H72i8b 



9 . le-64 



ACC# 



H72385 



ORF Name 



Utl&A£&&&.±lJAk 



Protein name 



NTID 



NT 



AA 



AAID Length Length 
3TB 



Score Probability 



105 



Locus Name 



Acc# 



Description 



MO-HIT 



459 




ORF Name 


NTID AAID 


NT 
Length 




AA 

— - , Score 
Length 


Probability 


24i0026b_c3_I88 


1546 [6768 


505 


1518 l^UU 


i.5e-132 


Protein name 








Locus Name 


Acc# 










sp:G6PD_ACTA<J 


P77809 


Description 














OLPCOgE - 6 - £>M0£ tHM't! 




(SSM)) 










ORF Name 


"MTTD AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




i"547 |676S 


/y 






Protein name 








Locus Name 


Acc# 


Description 










1 












1 


ORF Name 


NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


24£4mu..x&...ial 


1548 6770 


690 


2073 1U1 


0.6017 


Protein name 


Locus Name 


ACC# 


j hypotheticai protein MTH3b7 


pir :A69146 


A69146 


Description 












ORF Name 


NTID AAID 


NT 
Length 


AA 

— ^ Score 
Length 


Probability 


aSLlil^li^ti™! 


1549 6771 


297 




3.8e-ib 


Protein name 








Locus Name 


ACC# 


1 putative secretea oeta-gaiactosiciase 








AL133171 


Description 













Streptomyces coelicoior cosmict J?'8i. 



460 



ORF Name 


NTID 


AAID 


NT 
Length 




AA 

— , Score 
Length 


Probability 


2566767b_c3_20S 


1550 


6772 


341 


102b 




Protein name 












Locus Name 


Acc# 


Description 


















NO-HIT 




ORF Name 


NTID 


AAID 


NT 
Length 




AA 

— , Score 
Length 


Probability 
?y — — TCP* 


2sa3.aiii£x..±3L...iaa 


1551 


6773 


411 


1236 326 


z . be- 2 y 


Protein name 




Locus Name 


Acc# 


probable membrane 


protein bUS/B 






pir :F64826 


F64826 


Description 
















ORF Name 


NTID 


AAID 


NT 
Length 




AA 

— , Score 
Length 


Probability 


l±16$$:±l^t±J14. 


.... 1552 


6774 


206 


621 222 


1 2.0e-17 
. 1 


Protein name 












Locus Name 


Acc# 














sp:VEHU_EC0Ll 




Description 
















H^OTHEflCAL 62.1 




IN 


-BGLX INTERGENIC REGION 


PRECURSOR 1 


ORF Name 


NTID 


AAID 


NT 
Length 




AA 

— , , Score 
Length 


Probability 


l±MLltiti...a2..±B.9. 


1553 


6 77 5 


259 I 


T80 387. 


| 8.6e-J6 


Protein name 




Locus Name 


ACC# 


probable glucose- 


6-phospnate 


1- dehydrogenase 






pir:CTl3l9 


C71319 


Description 
















ORF Name 


NTID 


AAID 


NT 
Length 




AA 

Length Score 


Probability 


3.2.5..7.6.3.5....ai...l^ 


..... 1554 


6776 


426 


1281 




Protein name 












Locus Name 


Acc# 


Description 



















MO-HIT 



461 



* 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 




155b 


6777 


419 


1260 








Protein name 








Locus 


Name 


Acc# 


Description 
















NO-HLLT | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


3.M5.25..7....O....10.2. 


1556 


6778 


158 


477 








Protein name 








Locus 


Name 


Acc# 


Description 
















MO-HIT | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


maaaa^ci^w. 


1557 


6779 


515 


1548 




145 


9 .4e-08 



Protein name 



Locus Name 



conserved hypothetical protein AFU444 



h?ir:D6^Ul> 



Acc# 



D69305 



Description 





NT 


AA 

— - Score 


Probability 


ORF Name 


NTID AAID Length 


Length 


i.7.3.7.6.6.^tl^3.3. 


r5"5TT 6780 ±vb 


1188 578 


4.9e-bb 


Protein name 




Locus Name 


Acc# 



probable glutamate/ aspartate transporter 



pir:(i71iuy 



Description 



ORF Name 


NTID 


NT AA 
— — — Score 
AAID Length Length 


Probability 


5±±116:a..±±.±0.6. 


... 1559 


6781 149 450 304 


5.4-e-av 


Protein name 




Locus Name 


ACC# 


ftumA(R^yi) 




| gprXXUUbii 


U13633 


Description 


TncJ plasmid £391 


rumM^yl) 


and rumB(R3yl) genes, complete 


1 



462 



• 



ORF Name 



NT ID 



AAID 



— — Score Probability 
Length Length 



5275250 12 47 



5^T 



1.7e-53 



Protein name 



Description 



Locus Name 
sp:DHCT_METk!X 



Acc# 



Q59516 



RfiCOCrAtiJsl) (HUR-A) 



NT 



AA 



ORF Name 



NTID 



AAID 



7237787 cl 13 J" 



Length Length 
^7" 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



— — Score Probability 



ORF Name 



NTID 



AAID Length Length 



16 7 64" 



£F7" 



5.ie-oy 



Protein name 



Description 



Locus Name 



Acc# 



gp:Afe016260 



Agrobacterium tumetaciens piasm id p'l'i - riAKURA , complete sequence. 



NT 



AA 



ORF Name 



NTID 



T5FT 



AAID Length Length 
— 



F7S5" 



W5T 



Score Probability 
S.0e-26 



T51 



Protein name 



Locus Name 



coproporpnyrinogen oxidase, III, 
oxygen- independent hemN 



[pir:B69fo4u 



Acc# 



B69640 



Description 



463 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Zl 



Score Probability 
ir: 0010 



in 



Protein name 



Locus Name 



glycoprotein Vp26U-xiKe protein AIbL 



pir:T17b08 



Acc# 



T17508 



Description 



NT 



AA. 



ORF Name 



NTID 



AAID Length Length 



Score Probability 
|6.1e-74 



7TT 



Protein name 

metabolite transport protein nomolog ywtu 



Locus Name 



pir :E7UU/u 



Acc# 



E70070 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


25.6.12Mi....cX..zy. 


1566 


6788 


184 


555 








Protein name 








Locus 


Name 


Acc# 


Description 


















ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


26.1£A0AQ..±LM 


„ 1567 


6783 


61 










Protein name 








Locus 


Name 


Acc# 


Description 
















NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

Length 


Score 


Probability 


ll&bAZb.a.±±..:^. 


....... 156 8 


6790 


SOS 


1512 




124 


0.00028 



Protein name 



Locus Name 



STARP antigen 



1 .|gp:PPSTAKy 



Acc# 



Z26314 



Description 

P. falciparum gene tor staRP antigen. 



464 



• 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



751406 ci 44 



7TT 



7T 



Protein name 



Description 



Locus Name 



sp:ATP6_ACAtJA 



Acc# 



Q37385 



ORF Name 



$562501 41 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 
TTD 



Score Probability 



TUT 



Locus Name 



Acc# 



Description 
MO-HIT 



ORF Name 



Protein name 



Description 



NTID 



AAID 



— — Score Probability 

Length Length 



\ ±&6£Ati^...U....l I [I57T 



but 



|1.6e-52 



Locus Name 



ACC# 



X95938 



P.gingivalis rnhB & pgaA genes & orrs ibu, 197, ^02 & xyy. 



NT 



ORF Name 



NTID 



AAID Length Length 



AA 

— Score Probabilxty 



4.9e-65 



Protein name 



Locus Name 



2 , 3 -bisphosphoglycerate- independent 



| gp:AV120O9 u" 



Acc# 



AF120090 



Description 



Bacillus megaterzium 2 , 3 -bisph osphoglycerate- maependentpnospnogiycerate 
mutase (pgm) gene, complete cds. 



ORF Name 



NTID 



3613b:ill cl y 



TOT 



Protein name 



probable transport protein 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 
pir:AVb272 



7.8e-42 



Acc# 



A75272 



NT ah. score Probability 



AA 



ORF Name 



NTID 



3.5.3.1D.i:Ab....tl...5i.......... 



AAID Length Length 
152 



Protein name 



Description 



Locus Name 



Acc# 



IN0-H1T 



NT AA score Probability 



ORF Name 



NTID 



AAID Length Length 



[57TT 



F7T 



'4.7e.-36 



Protein name 



Locus Name 



|y I putative large secreted protein 



1 |9Pv^ m 



Acc# 



AL117669 



Description 



"Streptomyces coelicoior cosmia Vlz . 



ORF Name 



NTID 



— — S core Probability 

AAID Length Length - 



[57W 



fT7TT 



r7T 



10 . 04^ 



Protein name 



Locus Name 



Acc# 



[gp:&PMAL^i>7 



Description 

Plasmodium falcipa rum MALiP'7 , complete sequence. 




466 



NT 



AA 



ORF Name 



2434!i7b6 cl V 



NTID 
T5T7 



— Score Probability 
AAID Length Length 



[6799 " 



Protein name 



Locus Name 



Acc# 



Description 
IN0-H1T . 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



4S.£0.6M..±1..J. I 



95 



75" 



Protein name 



ctGQ2 hypothetical protein 



Locus Name 
J [pir^'V^b — " 



Acc# 



F72036 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
— 



Score Probability 



T3T 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length ~ 

3.0e-i4 



mu7 



7W 



TFTT 



Protein name 



Locus Name 



putative TonB- dependent outer membrane 
receptor 



|gp:AJ?'04»'7T5" 



Acc# 



AF048749 



Description 



Sacteroides tragilis capsular p olysaccharide biosynthesis operon, complete 
sequence . 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 




i2iiV076_ci__i5ii 


1581 




67 


204 










Protein name 








Locus 


Name 




Acc# 




Description 


















NO-HIT 




ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Pr 


obability 


i2.5.0A40.2...tl...l£ 


1582 


, 


954 


|2S6S 


249 




4 . ie-17 



Protein name 



Locus Name 



putative nistidme protein Kinase 



gpTmm^FT" 



Acc# 



U82564 



Description 



hydrogenase- like protein small subunit (noxs ) gene, nydrogenase-UKe protein 
large subunit (hoxC) gene, and putative histidine protein kinase (hoxJ) gene, 
complete cds,and nickel permease (hoxN) gene, partial cds . 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
1032 



Score Probability 
|4.S.e-33 



35T 



Protein name 



Locus Name 



capsular polysaccharide biosynthesis nomolog 
yveT 



|pir:A700^7 



Acc# 



A70037 



Description 



ORF Name 



Protein name 



NTID 



1584 



NT 



AA 



AAID Length Length 



Score Probability 



76 



Locus Name 



Acc# 



Description 
MO-HIT : 



468 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



12909^ t'± lib 



2 . 9e-12 



Protein name 




Locus Name 


Acc# 


' hypothetical protein siri86l 




] pir:S77u9V 


S77097 


Description 








optt Wam p NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


mmfi^c2Jm |i586 | 6808 




T551 207 


i.Se-ia | 


Protein name 




Locus Name 


Acc# 


1 putative riippase 




~] gp:AF125I64 


1 AF125164 


Description 




"Bacteroides tragilis t>lbk polysaccharide hi ^ 
complete sequence; and unknown genes. 


S B2 ) Diosynunesx, 






npT7 Tvr^ NTID AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


±±03A±&A...c>±J±^ isbv | 68 09 


150 


¥53 175 


2.5e-13 


Protein name 




Locus Name 


Acc# 


j hypothetical protein l 




1 pir:^U67fc 


S28678 


Description 








ot?f N^ttir NTID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


mS.Ml..cl...20.1 | IBS « 6810 




J1065 599 


2.9es-bB 


Protein name 




Locus Name 


Acc# 


mannose-1 -phosphate guanylyltransterase 


~| pir:H72J0i 


H72303 



Description 




ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


14567I35_cl_201 158 9 


6811 


369 


1110 


118 








Protein name 


Locus Name 




Acc# 
AF175714 




immunoreactive antigen FG32 


gp:AF17b714 






Description 


















"Torphyromonas gmgivalis strain WbO immunoreactive 43 
complete cds . 






ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Pr 


obability 


l46S834i>J:2_6b 1590 


6815 


583 


1752 


135 






Protein name 








Locus Name 




ACC# 




I hypothetical protein SPAci7Gb . iyc 






| pirrTS 


7851 




T37851 




Description 


















ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


±±12.6.5±1.±±..±± 15 91 


6813 


105 


318 


161 


7.6e-l^ 


Protein name 








Locus Name 




ACC# 
S77093 




'hypothetical protein siribbb 








1 pir:S7704»i 






Description 


















ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 

i n — — rm 




15£2B^m.±l.A ■] 1592 


6814 


647 




1944 


979 




± . ots- y o 




Protein name 








Locus Name 




Acc# 












sp:CAl>D_aTAAU 




P39853 




Description 



















470 



# 



NT 



AA 



ORF Name 



NTID 



15757007 c2 274 



AAID Length Length 
— 



Score Probability 
6.6e-29 



Protein name 



CpslK 



Locus Name 
gp:AFlbb804 



Acc# 



AF155804 



Description 



Streptococcus suis strain bbbb ^pslfcl (cpslE) gene, partial ccis;ups^ 
(cps2F) , CpslG (cpslG) , CpslH (cpslH) , CpslI (cpsll) , andCpslJ (cpslJ) 
genes, complete cds ; and CpslK (cpslK) gene, partialcds, 



NT 



AA 



ORF Name 



NTID 



AAID 



15S22S07 rl 2 



Length Length 
— 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


1.7.i^.0i..±l...dl 


1595 


5817 


57 


204 


49 


0.037 



Protein name 



Locus Name 



probable RNA- directed MA polymerase, : reverse 
transcriptase 



pir^OOlb 



Acc# 



S20016 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
— 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



NTID 



— — Score Probability 

AAID Length Length — 



19725412 c3 491" 



Protein name 

rolylpoiyglutamat e synthase/ ainyarotoiate 
synthase 



Description 



1280 



9 .2e-54 



Locus Name 
pir :D72411 



Acc# 



D72411 



ORF Name 



Protein name 



NTID 



T3W 



1682 0 



NT 



AAID Length Length 
2^7 



— Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



2.i£.5.1bb.7....ci...JJ.&.. 



Protein name 



NT 



NTID 



AAID Length Length 



— Score Probability 



1599 



|I06b 



TZTT 



2.2e-0b 



hypothetical protein Ki^« 



Locus Name 



Acc# 



D71690 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|ii4Stia^J....ca...i4tt I rrcro 



Protein name 



Description 



Locus Name 



Acc# 



MO-HIT " 



472 



ORF Name 



NT ID 



NT AA 

^^-r^ T — ^ T — ^ S core Probab ility 
AAID Length Length JL 



23594641 cl 155 



1601 



23TT 



2 . 9e-28 



Protein name 



Locus Name 



putative UDP-N- acetyl -D-mannos amine 
transferase 



Acc# 
U09239 



Description 



Streptococcus pneumoniae type 19F capsular poiysaccharidebiosynthesis 
operon, (cpsl9f ABCDEFGHI JKLMNO) genes, complete cds,and aliA gene, partial 
cds . 



NT 



AA 



ORF Name 



NTID 



23632S02 £3 145 



AAID Length Length 
T7TT 



813 



Score Probability 
"512 



Protein name 



Description 



Locus Name 



|gp:AB008S50 



Acc# 



AB008550 



Pseuciomonas aeruginosa phage phi CTX, complete genome sequence. 



NT 



AA 



ORF Name 



NTID AAID Length Length 

— 



Score Probability 




|1.3e-17 



Protein name 



Locus Name 



putative ammotransterase 



|gp:AF125164 



Acc# 



AF125164 



Description 



Bacteroides fragilis 638R polysaccharide B (PS B2) biosynthesislocus , 
complete sequence; and unknown genes. 



NT 



AA 



ORF Name 



\1±0M£±1..±1..±1& I [KM 



NTID AAID Length Length 



Score Probability 
l.fie-34 



T7^ 



Protein name 



Locus Name 



sp : VACO_BACSU 



Acc# 
Q06753 



Description 

HYPOTHETICAL TkNA/RRNA METH YLTRANS FERAS E YACO, 



473 



NT 



AA 



ORF Name 



NTID 



AAID 



244125:17 tl '20 



Length Length 



Score Probability 



Protein name 
Description 



Locus Name 



Acc# 



[NO-HIT 



ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


ll±±lB.B.l...a±J±M 16 06 


6828 


178 


537 


72 


" 0.048 


Protein name 






LOCUS 


Name 


Acc# 








sp:Y23b_ 


_METJA 


Q57687 


Description 












"HYPOTHET 1 C AL PROTUIM MJ023b | 1 


ORF Name NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2.4,4i.7.b.b.U..±i.-120. 1607 


682^ 


63 


192 


71 


- 0.026 


Protein name 






LOCUS 


Name 


Acc# 








Sp : FLIT 


_BACSU 


P39740 


Description 












"FLAGELLAR WtOTUlW FLIT | 


ORF Name NTID 


AAID 


NT 
Length 


AA 

Length 


Score 


Probability 


ZM.7.5.3.3..7....a3....i^B... 1608 


6830 


80 


243 






Protein name 






LOCUS 


Name 


Acc# 


Description 













474 



NT 



AA 



ORF Name 



NTID 



AAID 



Protein name 

Description 
NO-HIT 



— ^, — ^, Score Probability 

Length Length — JL 

STJI j p5 

Locus Name Acc# 



ORF Name 



NT ID 



AAID 



NT AA „ ^ _ . _ , ^ 
— — ^ Score Probability 
Length Length 



2&6A2&11..±1..±&Z I fl^TO" 



TUT 



Protein name 

Description 
(MO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NT ID 



AAID 



Length Length 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



24fit5i5Q2...ci...2Lia I fnrrz 



6834 



Length Length 
TFB 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



475 



NT 



ORF Name 



NTID 



AAID Length Length 



AA 

Score Probability 



24594187 cl 198 



T7W 



0.00060 



Protein name 



Locus Name 



iacunm 



| gp:A^u V blbl 



Acc# 



AF078161 



Description 



Manduca sexta Iacunm mRNA, complete cas. 





ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 


24864003_t3_i3i 


1614 


6536 


403 


1212 842 


5.2e-84 


Protein name 


Locus Name 


Acc# 


pantothenate metabolism tiavoprotein 


dtp 


" pir:t>^878 


D69678 


homolog ylol :probable aspartate 
t rioparhnwl n^f* antivase 








Description 












ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— Score 
Length 


Probability 




„ 1615 


6837 


455 


1368 




Protein name 








Locus Name 


Acc# 


Description 












NO-HIT 1 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 




1616 


6838 


" 418 


1257 862' 


4.0e-»fo 


Protein name 








Locus Name 


Acc# 



Description 

Propionigenium modestum mm dt), mmdc!, mmctB genes ana partial mmdAgene. 



476 



• 



NT 



AA 



ORF Name 



NT ID 



2562543a c± 190 



TZTT 



AAID Length Length 
TTT7 — 



Score Probability 
10 - 0069 



TOT" 



Protein name 



Locus Name 



Acc# 



transmembrane protein 



|gp:YacJPTM 



L11895 



Description 

Saccharomyces cerevisiae pu tative transmembrane protein (PTMij gene , 
complete cds . 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


262lO^J:l_iO 


1618 


6840 


3S3 


1182 


252 


3.3e-36 



Protein name 



Locus Name 



sensory transduction system regulatory 
protein slrl983 iprotein slrl983 rprotein 
Plrl983 



bir:S75664 



Acc# 



S75664 



Description 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 




Score 


Probability 


l£MA±L'2....a2Jl&l 


"... 1619 


6841 


159 


480 








Protein name 








Locus 


Name 


Acc# 


Description 
















NO-HIT | 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2.6.3.S.M0.O^J:2^8.1 


1^20 


6842 


154 


405 




91 


0.006V 



Protein name 



positive regulator ror virulence ractors 



Locus Name 
| gp:0Luokl'I 



ACC# 



D14877 



Description 

Clostridium perinngens virk gene tor positive regulator rorviruience 
factors, complete cds. 



ORF Name 



Protein name 



NTID 



6843 



hypothetical protein AF0417 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



K7T 



Locus Name 



pir :A69302 



b.ye-OS 



Acc# 
A69302 



NT 



AA 



ORF Name 



NTID 



AAID 



16£.8.119±.±1..±±L. I 11522 



Length Length 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



2.fiai5ftftL..cl...lft7. I [T^T 



NTID AAID Length Length 
OTB 



H5U" 



57T 



Score Probability 
213 



|2.4e-17 



Protein name 



Locus Name 



unknown 



gp:AP048749 



Acc# 



AF048749 



Description 



Bacteroicles tragilis capsular polysaccharide biosynthesis operon, complete 
sequence . 



NT 



AA 



ORF Name 



NTID 



12L12±\l.±2..M. 



AAID Length Length 
— 



Score Probability 



T£T5TT 



Protein name 



Locus Name 



T 1 ^ 3'-cuclic nucleotide 2 ' -phosphodiesterase 



gp:AB028630 



Acc# 



AB028630 



Description 



Clostridium pertrmgens hyp27, bacH, ptp, cpd genes torhypothetical 
protein, bacterial hemoglobin, protein- tyrosinephosphatase, 2', 3*~cuclic 
nucleotide 2 * -phosphodiesterase , partial and complete cds . 



478 



NT 



AA 



ORF Name 



NT ID 



12843255 t2 ±24 



AAID Length Length 

\z&n — 



Score 



pur 



Probability 
l.ie-05 



Protein name 



Locus Name 



|gp:AFlifo4W 



Acc# 



AF136495 



Description 



Campylobacter lari GlyA ^giyA) 


gene, 


partial 


cds . 




ORF Name NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


2§3l557J:l_4 1626 6 


§48 


258 


777 217 


8.9e-i§ 



Protein name 



Locus Name 



probable DNA pol ill epsxion cnain 



bir:S7i!>36 



Acc# 



B71536 



Description 



NT 



ORF Name 



NTID 



AAID Length Length 



AA 

— , Score 



1627 



TTUT 



Probability 
TTTe^TZ 



Protein name 



Locus Name 



galactosyl transterase 



gp:SL>W23yOU4 



Acc# 



AJ239004 



Description 

Streptococcus pneumoniae type a capsular gene cluster. 



NT 



AA 



ORF Name 



NTID 



AAID 



1628 



"6 8 SO"" 



Length Length 



Score Probability 



TTT7" 



Protein name 



Locus Name 



Acc# 



Description 



• 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


3184912SJ:3_13:4 


1629 






897 420 


■' 2.7e-39 


Protein name 








Locus Name 


Acc# 


1 DMA repaxr protein 








"| pir:A7Wyl 


A75391 


Description 


ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


120AOKLb..±Z...±±2 


1630 


6852 


262 ' 


|792 394 


1.6e-36 


Protein name 








Locus Name 


Acc# 



sp:RECM_Js!(JoLl 



Description 

DNA RUPAlk PROTEIN kECN (kUUuMB iaATloU PkOTSiN mj 



ORF Name 


NTID 


AAID 


^ M Score 
Length Length 


Probability 


126&±6.21^t±^ 


1631 


6853 


T5I 456 128 


2.4e-08 



Protein name 



Locus Name 
sp:DP3B_VliiHA 



Acc# 



P52620 



Description 

SNA ^uL.VMEKArilfl 111, JjUTA (JHAI N, (FkAcJMJiiN'l') 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
T5T 



Score Probability 



Protein name 



Locus Name 



Acc# 



Description 



[NO-HIT 



480 



• 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



33788882 cl 2l2 



F8175~ 



TZW~ 



1.2e-3< 



Protein name 



Locus Name 



conserved hypotneticai protexn aq_^74 



pir:C7032!j 



Acc# 



C70325 



Description 









NT 


AA 


Score 


Probability 


ORF Name 


NTID 


AAID 


Length 


Length 






MD..7.D.3.11...ci...ly.l... 


1634 


5856 


347 


1044 


132 




7.0e-06 



Protein name 



transmembrane protein 



Locus Name 
gp : ai>AJ6y8b 



Acc# 



AJ006986 



Description 

Streptococcus pne umoniae type DNA, capsular gene ciuscer. 







NT AA 

— — , Score 


Probability 


ORF Name NTID AAID 




Length Length 


3.5.7.SA:/.6.^t^6.1 1635 6857 


JIG 551 593 


1.3e-b7 


Protein name 


re< 


Locus Name 
iuctase | cm ;BPE238308 


Acc# 
AJ238308 



Description 

Bordetella pertussis partial gene ior putative tmoesterase , tKNA-iiiy, murb, 
dapB, omlA genes and partial fur gene. 



NT 



ORF Name 



NTID 



l$±l£A±..±'2...&0. I 



AAID Length Length 



AA 

— Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT"" 



481 



• 



NT 



AA 



ORF Name 



jy43802 £2 66 



NTID AAID Length Length 

— 



TZTT 



TUT 



Score Probability 
313 



|4.0e-31 



Protein name 



Locus Name 



gp:AF0955 78 



Acc# 
AF095578 



Description 



Salmonella typhimurium YjgF lyjgF) gene" complete eels; and unJcnowngene . 



NT 



AA 



ORF Name 



NTID 



3544687 ±3 143 



_ _ v — r — ^, Score Probability 
AAID Length Length JL 

6860 



5^" 



3 . 3e-18 



Protein name 



Locus Name 



hypothetical protein AF0417 



pir :A69302 



Acc# 



A69302 



Description 



NT 



AA 



ORF Name 



\it)£A0M..±lJl& I im? 



NTID AAID Length Length 

m^i — 



Score Probability 

m 



0.0014 



Protein name 



Locus Name 



provable integral membrane protein 



pir :T37050 



Acc# 



T37050 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



6862 



Length Length 



Score Probability 



TTTT 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 
-r — _ — , Score Probability 
Length Length 



T55" 



| 7.2e-lfe 



Protein name 



Locus Name 



serine acetyltransterase 



pir :G72349 



Acc# 



G72349 



Description 



482 



NT 



ORF Name 



NT ID 



AAID Length Length 



AA 

— Score Probability 



'1145067 c4 341 



3 . 7e-19 



Protein name 



Locus Name 



serine acetyltransterase 



|pir:G7234y 



Acc# 



G72349 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



T£TT" 



14 . 6e-0b 



Protein name 



Description 



Locus Name 
spTYTTT^YITYT 



Acc# 
P74442 



HYPO T HETICAL WD-Ria PEM' kkuTEIN £Lk0143 



NT 



AA 



ORF Name 



NTID 



AAID 



1644" 



Length Length 
FTS" 



— Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NT 



ORF Name 



NTID 



AAID 



Length Length 



AA 

— , Score 



2442 



Probability 
5.2e-74 



Protein name 



Description 



Locus Name 



sp:BA<JA m HA0Ll 



Acc# 



068006 



483 



ORF Name 



14773251 ci 196 



Protein name 
Description 



NT 



AA 



NTID 



AAID 



Length Length 
7T3 



Score Probability 
0.017 



103 



Locus Name 



| sp:YJBHJ*JOLl 



Acc# 



P32689 



ORF Name 



I48021S8 ci 4ofe 



Protein name 



NTID 



AAID 



NT AA 

— — Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



HIT 



ORF Name 



Protein name 



NTID 



probable upopolysaccnariae 
N-acetylglucosaminyl transferase , rf bU 



Description 



— — Score Probability 



AAID Length Length 



409 



1230 



Locus Name 



[pir:P64b0O 



1 . 4e-oy 



Acc# 



F64500 



ORF Name 



ABMA5.D..±l...b:2. 



Protein name 



NTID 



phnP protein ipnnlO homo log 



Description 



— — Score Probability 



AAID Length Length 



7ZW 



Locus Name 



pir :U7Ui6b 



[2 .6e-43 



Acc# 
D70166 



484 



NT 



AA 



ORF Name 



NTID 



AAIP Length Length 



Score Probability 



Protein name 

oxaloacetate decarboxylase, summit alpJia 
(oadA) homolog 



Locus Name 



Description 



pir:CS9406 



14 . 4e-47 



Acc# 



C69406 



NT 



AA 



ORF Name 



NTID 



AAID 



Length Length 
T5B 



Score Probability 
300 



i.4e-26 



Protein name 



Locus Name 



sp:DP^J^WKJ 



Acc# 



P13455 



Description 

DNA POLYM E RASE Ill , bMTA UMAIN, 



!! n 

V 3 



h vS 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



S.0M5:n.±ZJlh 



'8.3e-iil 



Protein name 



Locus Name 



putative mstidine protein Kinase 



gp:REUB^b4 



Acc# 



U82564 



Description 



hydrogenase-like protein sm all subunit (hoxB) gene, nyarogenase-iiKe protein 
large subunit (hoxC) gene, and putative histidine protein kinase (hoxJ) gene, 
complete cds,and nickel permease (hoxN) gene, partial cds . 



ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


S26M.^L.aL.±9.1 


1553 


6§75 


440 1323 


108 


2.8e-Ub 



Protein name 

Description 
FERREDOXIN 



Locus Name 



sp:PEk_MtlTBA 



Acc# 



P00202 



485 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



TIT 



l . oe-41 



Protein name 



ss-1, 4-gaiactosyitransterase 



Locus Name 
gp:S!XJi>fcJ14l!i 



Acc# 



X85787 



Description 

S. pneumoniae cpsi4 locus. 



NT 



AA 



ORF Name 



NTID 



AAID 



S04S4S2 11 21 



Length Length 
2T7 



Score Probability 



7F 



Protein name 
Description 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



|£Al4m...cl...l8.&... ....I 11555 



Length Length 
1218 



Score Probability 
|i.Se-07 



Protein name 



Locus Name 



NADH dehydrogenase i ubiquinone I , , 3y Kua 
subunit homolog 



bir:H69478 



ACC# 
H69478 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



££AA4:lb....a^.l6.h..... 



JUT 



2..7e-iy 



Protein name 

hypothetical protein snu744 



Locus Name 



bir:^77o7y 



Acc# 



S77079 



Description 



486 



ORF Name 



Protein name 



• 



NTID 



NT 



AA 



AAID Length Length 
TW£ 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 

— 



Score Probability 



T2TT 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



NT 



AA 



AAID Length Length 



Score Probability 



Locus Name 



Acc# 



Description 



INO-HIT 



ORF Name 



Protein name 



NTID 



NT AA 

— — Score Pro bab ility 
AAID Length Length 



6883 



T7T 



l.Oe-40 



Locus Name 



capsular polysaccharide Joiosynthesis nomolog 
yveT 



IpirrAVOOiV 



Acc# 



A70037 



Description 



487 



# 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1662 



16884 



T7T" 



Protein name 



Locus Name 



phosphate starvation inducible protein 
homo log ylaK 



Description 



|pir:Abyri7J 



1.7e-76 



Acc# 
A69873 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



iam^i..±^iJibi I rreCT 



AAID Length Length 
[2TT5 



Score Probability 



[7TT 



Locus Name 



Acc# 



INO-HIT 



ORF Name 



Protein name 



Description 



NT 



AA 



NTID 



AAID 



1664 



6886 



Length Length 
FT77 



Score Probability 



TEW 



Locus Name 



Acc# 



INO-HIT 



ORF Name 



Protein name 



NT 



AA 



NTID 



AAID Length Length 



Score Probability 



SF27- 



ie-11 



Locus Name 



rioonuclease h, I" 



Acc# 



JC5787 



Description 



488 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



1072tm7 ti 7b 



TOFF" 



636 



4.0e-22 



Protein name 



Description 



Locus Name 



Acc# 



P37261 



HYPOTHETICAL 21.1 Kb ft&OTOlN IN Ft^l-AC^Pl IMTEfeciSMId k^loN 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 
T22I — 



TOST 



EES" 



Score Probability 
|2.2e-07 



Protein name 



Locus Name 



hypothetical protein BBI16 



bir:(S70241 



Acc# 



G70241 



Description 



ORF Name 



NTID 



AAID 



1668 



TOW 



Protein name 



DNA topoisomerase III topB 



Description 



— — Score Probability 
Length Length — 



2145 



TuTF" 



1.9e-102 



Locus Name 



|pir:H6<m4 



Acc# 



H69724 



ORF Name 



Protein name 



NTID 



1655 



AAID 



TOST" 



— — Score P robability 
Length Length 



TFT" 



Locus Name 



Acc# 



Description 



BTO-HIT 



ORF Name 



119.242LD.5....a3....b.iy.., 



Protein name 



NTID 



TTOTT 



AAID 



TO3T 



NT 



AA 



Length Length 
TT3 



Score Probability 



TTTT 



Locus Name 



Acc# 



Description 



[NO-HIT 



489 



ORF Name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length 



1212762 c2 390 



1671 



6893 



2.7e-32 



Protein name 



Description 



Locus Name 



gp:AB01^b7 



Acc# 



AB012957 



"Vibrio cholerae genes tor o-antigen synthesis, strain U22, completecds , 



NT 



AA 



ORF Name 



1225S425 el iyl 



NT ID AAID Length Length 

— 



1 I5D3 



Score Probability 
|5.5e-l8 



Protein name 



Locus Name 



putative glycosyl transterase 



gp:AP04fi74y 



Acc# 



AF048749 



Description 



Bacteroides tragilis capsular poiysaccnanae mosyntnesis operon, complete 
sequence . 



NT 



AA 



ORF Name 



NT ID AAID Length Length 

STT7 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



INO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



|1.3e-8l 



Protein name 



Locus Name 



pro£>a£>le swt/snt nelicase 



pir:E71481 



Acc# 



E71481 



Description 



490 



ORF Name 


NT ID 




AAID 


NT 
Length 




AA 
Length 


Score 


Probability 


I3885212_J:3_220 






597 


388 


1167 






Protein name 












Locus Name 


Acc# 


Description 


















NO-HIT [ 


ORF Name 


NT ID 




AAID 


NT 
Length 




AA 
Length 


Score 


Probability 


l&±±6.6.±b..±2..±Z.& 


1676 


61 




416 


12bl 






Protein name 












Locus Name 


Acc# 


Description 


















NO-HiY | 


ORF Name 


NTID 




AAID 


NT 
Length 




AA 
Length 


Score 


Probability 


li.7.16.2.b.2...cl...b.ib. 


1677 


6 


399 


427 


1284 


241 


7.6e-20 


Protein name 


Locus Name 


Acc# 


MocB (Tn4399) 


pir :B48487 


B48487 


Description 


ORF Name 


NTID 




AAID 


NT 
Length 




AA 
Length 


Score 


Probability 


15.D.17.2a:/...±3....25.2. 


1678 


6900 


837 


2514 


391 


2.2e-32 


Protein name 












Locus Name 


Acc# 


ennanced entry prot 


ein EnnC 










gp:AP0b77u4 


AF057704 



Description 



Legionella pneumophila finh A (enhA) , 2nh£ (ennB) , ana ennancea entryprotem 
EnhC (enhC) genes, complete cds . 



491 



# 



ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


16I358B6J:3_lyi 


1679 


6501 


103 


312 






Protein name 








Locus 


Name 


Acc# 


Description 














MO-HIT 1 


ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


±6229.&2S..±2..±2± 


1680 


6902 


280 


843 


105 


" 0.00081 


Protein name 








Locus 


Name 


Acc# 










sp: YS21 


__B0RBU 




Description 














HYPOTHETICAL PROTEIN Bhb'Al 












ORF Name 


NTID 


AAID 


NT AA 
Length Length 


Score 


Probability 


i£mm..±i...iaa 


1681 


6903 


1951 


5856 


lOll 


1.5e-118 



Protein name 



Description 



Locus Name 



Acc# 



gp:ABOl6260 



Agrobacberium tume faciens plasmrd pTi-SAKUKA, complete sequence. 



ORF Name 


NTID 


AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


I6.a3.2as.5...±2...i6.a 


1682 


6904 


450 


1353 1713 


2.6e-176 


Protein name 








Locus Name 


Acc# 



hypotnetxcai protein 



1 (pir:JQ1020~ 



JQ1020 



Description 



• 



ORF Name 



NTID 



AAID 



— — , Score Probability 
Length Length 



19562660 t2 ±U 



1683 



W7TT 



1437 



TTT 



0.0074 



Protein name 



Locus Name 



ES/130 



|gp:AF0067Sl 



Acc# 



AF006751 



Description 



Homo sapiens ES/130 mRNA, complete ccts . 



ORF Name 



lS7l3l ti 35 



Protein name 



NTID 



NT 



AA 



AAID Length Length 

T&m — 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



-L$.$ASA02..±±...5.1 



Protein name 



NTID 



AAID 



NT 



Length Length 



AA 

— Score Probability 



Locus Name 



Acc# 



Description 



MO-HIT 



ORF Name 



±$.$.S.$AX.±1..£& 



Protein name 



transposase 



Description 



NT 



NTID 



AAID Length Length 



AA 

— Score Probability 



I04y 



6.le-106 



Locus Name 



|gp:AF038866 



Acc# 



AF03 8866 



Bacteroides rragilis transposon Tnbb20 transposase (bipHj andmoJDiiization 
protein BmpH (bmpH) genes, complete cds . 



493 



ORF Name 



'20213132 c2 406 



Protein name 



t 



NTID 



AAID 



NT AA 

— — , Score Probability 
Length Length 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



T5W 



AAID 



NT 



AA 



Length Length 

m 1 [233 — 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



^TI- 



NT 



AA 



Length Length 
TT5" 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



ORF Name 



Protein name 



NTID 



AAID 



NT 



AA 



Length Length 

prui — 



Score Probability 



Locus Name 



Acc# 



Description 



NO-HIT 



494 



t 



NT 



AA 



ORF Name 



NT ID 



AAID Length Length 



Score Probability 



rl b 



y .4e-07 



Protein name 



Description 



Locus Name 



|sp:M49_STRPY 



Acc# 



P16947 



ORF Name 



NT ID 



NT AA 

— — Score Probability 
AAID Length Length 



21S1S632 ti 17 



3809 



Protein name 



Locus Name 



tetracycline resistance element regulator 
RteA 



Description 



pir :A41860 



Acc# 



A41860 



%4 



ORF Name 



Protein name 



Description 



INC -HIT 



NT 



AA 



NTID AAID Length Length 

TT7 



Score Probability 



Locus Name 



Acc# 



ORF Name 



Protein name 



NT 



AA 



NTID AAID Length Length 

wm — 



Score Probability 



5516 



UT 



Locus Name 



Acc# 



Description 



NO-HIT 



495 



t 



NT 



AA 



ORF Name 



rl 45 



NTID 




AAID 



Length Length 
T7T 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



(NO-HIT 



NT 



AA 



ORF Name 



NTID 



121S.9A&1...C1..A&& I 



AAID Length Length 

— 



TSTT 



453 



Score Probability 
T3T3 



l . 5e-08 



Protein name 



Description 



Locus Name 



|gp:APTJ7223a 



Acc# 



U72238 



Anabaena PCC1120 Oftfftl, , 0E>F&3, 0&FR4, and ORtfftS genes , complete 

sequences . 



ORF Name 



NTID 



NT AA 

_ „ — _ — _ Score Probability 
AAID Length Length • L - 



ai&fl7.&is.„ci...a2a I itst7 



OUT 



|4.3e-05 



Protein name 



Locus Name 



phage abortive mtectlon protein 



pir :T3U326 



ACC# 



T30326 



Description 



NT 



AA 



ORF Name 



226£$£l±...a2...1&£ 



NTID AAID Length Length 




16 98 



Score Probability 
|4.7e-122 



T2uT" 



Protein name 



Locus Name 



UDP-galactopyranose mutase 



gp:SPAJ'6£S6 



Description 

Streptococcus pneumoniae type 3 3F DNA, capsular gene cluster. 



Acc# 



AJ006986 



♦ • 



ORF Name 


NT ID 


AAID 


NT 

Length 


AA 
Length 


Score 


Probability 


22774087J:2jL26 






2^6 741 


85 


0.0034 


Protein name 








Locus Name 


Acc# 


non -st rue tur a 1 


5a protein 






gp:HCUb6bV0 


U56570 


Description 




Hepatitis C virus isolate 925821 non-s 


tructural 5a (NS5a) gene, partial cas . 




ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


22860128J:3_2b8 


1700 


S£22 


83 252 


64 


0.031 


Protein name 








Locus Name 


Acc# 










sp:SfRC_XEMLA 


P36378 


Description 














(OSTEONECTIN) 


(ON) (BAtWMUNT 


MEMBRANE 


PROTEIN 


BM-40) 




i 


ORF Name 


NT ID 


AAID 


NT 
Length 


AA 
Length 


Score 


Probability 


2.2.8i>.2M.b...±^..llb. 


1701 


6923 


452 1359 


2053 


1.4e-216 



Protein name Locus Name Acc# 



gp : BMRRTJilAU 



Description 

Sacteroides thetaiobaomicr on rteA and rtaB genes involvea mproduction or 
plasmid-like forms, complete cds, and tetQ gene, 3* end. 



ORF Name 


NT ID AAID 


NT 
Length 


AA 

— , Score 
Length 


Probability 


llAlltbA.±l..±&&. 


1702 6324 


433 1302 160 


2 .4e-08 


Protein name 




Locus Name 


Acc# 


actin binding protein mayvkjni 


gp:A*'0^b6y 


AF059569 



Description 



Homo sapiens actin binding protein mayvkn mRNA, complete cas. 



497 



NT 



AA 



ORF Name 



NTID 



AAID 



23452786 i2 U2 



IT7UT 



Length Length 
WB 112^5 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



1N0-BIT 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length ^ 



naa&aii-ta-iaa I ittw 



TT 



Protein name 



Description 



Locus Name 



Acc# 



[NO-KIT 



NT 



AA 



ORF Name 



NTID 



AAID 



116A<L5.5.2.±±J±±... 



Length Length 



Score Probability 
FT7 



Protein name 



Description 



Locus Name 



gp :BFU6 3 0 96 



Acc# 



U63096 



Bacteroides tragilis (bctA) gene, complete cds . 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length 



amcim...ci.„3L2Lj. i ittm 



81 



Protein name 



Locus Name 



Acc# 



hypotnetical protein 



gp:AF036485 



Description 

Piasmid pNZ4000, complete sequence. 



498 



• 



NT 



AA 



ORF Name 



.24251937 tl i 



NTID 
T7TT7 



AAID 



Length Length 
33 



Score Probability 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



T7W 



6930 



Length Length 
322 



Score Probability 
T7TT 



5.4e-34 



Protein name 

Description 
GENERAL STRESS PROTEIN A 



Locus Name 



IspiSSPAJiAUSU 



Acc# 



P25148 



NT 



AA 



ORF Name 



NTID 



AAID 



243.4.7.u9.0...±l...;U.. 



1709 



^3IT 



Length Length 
T33 1 



Score Probability 



Protein name 

Description 
INO-HIT 



Locus Name 



Acc# 



ORF Name 



NTID 



AAID 



NT AA 

— , — , Score Probability 
Length Length — 



2.^iaaiz...ti...5.a 



TTTTT 



Protein name 

Description 
NO-HIT 



Locus Name 



Acc# 



NT 



AA 



ORF Name 



NTID 



AAID 



24415885 t3 171 



T7TT" 



Length Length 
FO 



Score Probability 



Protein name 



Description 



Locus Name 



Acc# 



NO-HIT 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



T7TT 



1ST 



9.0e-25 



Protein name 



Locus Name 



ribonuciease ill (rnc) nomolog 



[pir:H70l87 



Acc# 



H70187 



Description 



NT 



AA 



ORF Name 



NTID 



AAID 



Z] 11713 



Length Length 

tub — 



TUT" 



Score Probability 

o-.ooia — 



Protein name 



Locus Name 



vrll protexn 



pir :T17388 



Acc# 



T17388 



Description 



NT 



AA 



ORF Name 



NTID 



AAID Length Length 



Score Probability 



\2&6.126&l.±1...2b.h. 



TTTT 



2.5e-154 



Protein name 



Locus Name 



argmine decarboxylase, 2 : protein 
slr0662 :protein slr0662 



pir:S76771 



Acc# 



S76771 



Description 



ORF Name 



Protein name 



NTID 



r7T5~ 



complement cy precursor 



Description 



NT 



AA 



AAID Length Length 



Score Probability 



TTT 



Locus Name 



tair:C3HU 



0.00018 



Acc# 



500 



